{ "version": "https://jsonfeed.org/version/1.1", "user_comment": "This feed allows you to read the posts from this site in any feed reader that supports the JSON Feed format. To add this feed to your reader, copy the following URL -- https://eloquentarduino.github.io/category/programming/arduino-machine-learning/feed/json/ -- and add it your reader.", "home_page_url": "https://eloquentarduino.github.io/category/programming/arduino-machine-learning/", "feed_url": "https://eloquentarduino.github.io/category/programming/arduino-machine-learning/feed/json/", "language": "en-US", "title": "Arduino Machine learning – Eloquent Arduino Blog", "description": "Machine learning on Arduino, programming & electronics", "items": [ { "id": "https://eloquentarduino.github.io/?p=1416", "url": "https://eloquentarduino.github.io/2020/12/tinyml-benchmark-table/", "title": "The Grand Benchmark Table of Embedded Machine Learning", "content_html": "
How tiny is TinyML? How fast is TinyML?
\nDo you want to get some REAL numbers on embedded machine learning on Arduino, STM32, ESP32, Seeedstudio boards (and more coming)?
\nThis page will answer all your questions!
\n\n\n
If you're new to this blog, you need to know that (almost one year ago) I settled on a mission to bring machine learning to embedded microcontrollers of all sizes (even the Attiny85!).
\nTo me, it is just insane to deploy heavyweight Neural Networks to such small devices, if you don't need their expressiveness (mainly image and audio analysis). The vast majority of embedded ML tasks is, in fact, related to sensors' readings, which can easily be solved with "traditional" ML algorithms.
\nToday's industry seems to be more leaned toward Neural Networks, though, so I thought it would be beneficial for you readers to get an actual grasp on the potential of traditional Machine learning algorithms in the embedded context.
\nOn this blog you can find posts about:
\nAll these algorithms go a long way in both accuracy and resource comsumption, so (in my opinion) they should be your first choice when developing a new project.
\nTo support my claimings I made a huge effort to collect real world data, and now I want to share this data with you.
\nBefore you ask:
\n"Are Neural Networks models benchmarked here?". No.
\n"Will Neural Networks model be benchmarked in the future?". Yes, as soon as I'm comfortable with them: I want to create a fair comparison between NN and traditional algorithms.
\nSo now let's move to the contents.
\nI run the benchmarks on the boards I have at hand: they were all purchased by me, except for the Arduino Nano BLE Sense (given to me by the Arduino team).
\nI picked a small selection of toy and real world datasets to benchmark the classifiers against (the real world ones were picked from a TinyML Talks presentation when easily available, plus some more from the UCI database almost at random).
\nHere's the list of the benchmarked datasets, with the shape of the dataset (in the format number of samples
x number of features
x number of classes
).
(150 x 4 x 3)
: from the sklearn package(178 x 13 x 3)
: from the sklearn package(1797 x 64 x 10)
: from the sklearn package(10299 x 561 x 6)
(4800 x 180 x 10)
(1000 x 128 x 6)
(1648 x 63 x 5 )
(1000 x 19 x 5)
(846 x 18 x 4)
(830 x 4 x 2)
(1000 x 48 x 11)
The datasets are chosen to be representative of different domains and the list will grow in the next weeks.
\nSome datasets are used as-is, others were pre-processed with very light feature extraction. In detail:
\nHuman Activity
features were extracted with a rolling window, and for each window min/max/avg/std/skew/kurtosis were calculatedSport Activity
got the same pre-processing, and the number of actvities was reduced from 19 to 10EMG
features were extracted with a rolling window, and for each window the Root Mean Square value was calculatedThe reported benchmarks only consider the inference process: any feature extraction is not included! Nevertheless, only features with linear time complexity were used, so any MCU will have no problem in computing them.
\nThe following classifiers are benchmarked:
\nWhy these classifiers?
\nBecause they're all supported by the micromlgen package, so they can easily be ported to plain C.
\n* XGBoost porting failed on some datasets, so you will see holes in the data. I will correct this in the next weeks
\nmicromlgen
actually supports Support Vector Machines, too: it is not included because on real world datasets the number of support vector is so high (hundreds or even thousands) that no single board could handle that.
If you want to stay up to date with the new numbers, subscribe to the newsletter: I promise you won't receive more than 1 mail per month.
\n\r\nThis section reports (a selection of) the charts generated from the benchmark results to give you a quick glance of the capabilities of the aforementioned boards and algorithms in terms of performance and accuracy.
\nIf you like an interactive view of the data, there's a Colab Notebook that reproduces the charts reported here, where you can interact with the data as you like.
\nAt the very end of the article, you can also find a link to the raw CSV file I generated (as you can see, it required A LOT of work to create).
\nThe overall accuracy of each classifier on each dataset (this plot is not bounded to any particular board, it is computed "offline").
\n\nComment: many classifiers (Random Forest, XGBoost, Logistic Regression) can easily achieve up to 95+ % accuracy on some datasets with minimal pre-processing, while still scoring 85+ % on more difficult datasets.
\nThese charts plot, for each dataset, how much flash (in percent on the total available) it takes for the classifier to compile (visit the Colab Notebook to see all the charts).
\n\n\nComment: DecisionTree, GaussianNB and Logistic Regression require the least amount of flash. XGBoost is very "flash-intensive"; RandomForest sits in the middle.
\nAs low as 6% of flash size for a fully functional DecisionTree with 85+% accuracy.
\nThese charts plot, for each dataset, how long it takes for the classifier to run (only the classification, no feature extraction!).
\n\n\nComment: DecisionTree is the clear winner here, with minimal inference time (from 0.4 to 30 microseconds), followed by Random Forest. Logistic Regression, XGBoost and GaussianNB are the slowest.
\nAs fast as sub-millisecond inference time for a fully functional DecisionTree with 85+% accuracy.
\nThis plot correlates the inference time vs the classification accuracy. The more upper-left a point is, the better (fast inference time, high accuracy).
\nClick here to open the image at full size
\n\nComment: as already stated, you will see a lot of blue markers (Decision Tree) in the top left, since it is very fast and quite accurate. Moving to the right you can see purple (Logistic Regression) and orange (Random Forest). GaussianNB (red) exhibits quite low accuracy instead.
\nThis plot correlates the inference time vs the the (relative) flash requirement. The more lower-left a point is, the better (fast inference time, low flash requirements).
\nClick here to open the image at full size
\n\nComment: Again, we see blue (Decision Tree) is both fast and small, followed by Logistic Regression and Random Forest. Now it is clear that XGBoost (green), while not being the slowest, is the more demanding in terms of flash.
\nI hope this post helped you broaden your view on TinyML, on how tiny it can be, how fast it can be (sub-millisecond inference!), how wide it is.
\nPlease don't hesitate to comment with your opinion on the subject, suggestions of new boards or datasets I should benchmark, or any other idea you have in mind that can contribute to the purpose of this page.
\nAnd don't forget to stay tuned for the updates: I already have 2 more boards I will benchmark in the next days!
\nAs promised, here's the link to the raw benchmarks in CSV format.
\nYou can run your own analysis and visualization on it: if you use it in your own work, please add a link to this post.
\nIn future posts I will share how I collected all those numbers, so subscribe to the newsletter to stay up to date!
\n\r\nL'articolo The Grand Benchmark Table of Embedded Machine Learning proviene da Eloquent Arduino Blog.
\n", "content_text": "How tiny is TinyML? How fast is TinyML?\nDo you want to get some REAL numbers on embedded machine learning on Arduino, STM32, ESP32, Seeedstudio boards (and more coming)? \nThis page will answer all your questions!\n\n\nBackground\nIf you're new to this blog, you need to know that (almost one year ago) I settled on a mission to bring machine learning to embedded microcontrollers of all sizes (even the Attiny85!).\nTo me, it is just insane to deploy heavyweight Neural Networks to such small devices, if you don't need their expressiveness (mainly image and audio analysis). The vast majority of embedded ML tasks is, in fact, related to sensors' readings, which can easily be solved with "traditional" ML algorithms.\nToday's industry seems to be more leaned toward Neural Networks, though, so I thought it would be beneficial for you readers to get an actual grasp on the potential of traditional Machine learning algorithms in the embedded context.\nOn this blog you can find posts about:\n\nDecision Tree, Random Forest and XGBoost\nGaussian Naive Bayes\nSEFR - a binary classifier\nPCA for dimensionality reduction\nRelevant Vector Machines\nSVM for gesture detection\nOne Class SVM for anomaly detection\n\nAll these algorithms go a long way in both accuracy and resource comsumption, so (in my opinion) they should be your first choice when developing a new project.\nTo support my claimings I made a huge effort to collect real world data, and now I want to share this data with you.\nBefore you ask:\n"Are Neural Networks models benchmarked here?". No.\n"Will Neural Networks model be benchmarked in the future?". Yes, as soon as I'm comfortable with them: I want to create a fair comparison between NN and traditional algorithms.\nSo now let's move to the contents.\nThe boards\nI run the benchmarks on the boards I have at hand: they were all purchased by me, except for the Arduino Nano BLE Sense (given to me by the Arduino team).\n\nEspressif ESP32\nEspressif ESP8266 NodeMCU v1.0\nSTM32 Nucleo L432KC (Cortex M4)\nSeeedstudio XIAO (SAMD21 Cortex M0)\nArduino Nano 33 BLE Sense (Cortex M4F)\n\nThe datasets\nI picked a small selection of toy and real world datasets to benchmark the classifiers against (the real world ones were picked from a TinyML Talks presentation when easily available, plus some more from the UCI database almost at random).\nHere's the list of the benchmarked datasets, with the shape of the dataset (in the format number of samples x number of features x number of classes).\n\nIris (150 x 4 x 3): from the sklearn package\nWine (178 x 13 x 3): from the sklearn package\nDigits (1797 x 64 x 10): from the sklearn package\nHuman Activity (10299 x 561 x 6)\nSport Activity (4800 x 180 x 10)\nGas Sensor Array Drift (1000 x 128 x 6)\nEMG (1648 x 63 x 5 )\nGesture Phase Segmentaion (1000 x 19 x 5)\nStatlog (Vehicle Silhouettes) (846 x 18 x 4)\nMammographic Mass (830 x 4 x 2)\nSensorless Drive Diagnosis (1000 x 48 x 11)\n\nThe datasets are chosen to be representative of different domains and the list will grow in the next weeks.\nSome datasets are used as-is, others were pre-processed with very light feature extraction. In detail:\n\nHuman Activity features were extracted with a rolling window, and for each window min/max/avg/std/skew/kurtosis were calculated\nSport Activity got the same pre-processing, and the number of actvities was reduced from 19 to 10\nEMG features were extracted with a rolling window, and for each window the Root Mean Square value was calculated\n\nThe reported benchmarks only consider the inference process: any feature extraction is not included! Nevertheless, only features with linear time complexity were used, so any MCU will have no problem in computing them.\nThe classifiers\nThe following classifiers are benchmarked:\n\nDecision Tree\nRandom Forest\nXGBoost\nLogistic Regression\nGaussian Naive Bayes\n\nWhy these classifiers?\nBecause they're all supported by the micromlgen package, so they can easily be ported to plain C.\n* XGBoost porting failed on some datasets, so you will see holes in the data. I will correct this in the next weeks\nmicromlgen actually supports Support Vector Machines, too: it is not included because on real world datasets the number of support vector is so high (hundreds or even thousands) that no single board could handle that.\nIf you want to stay up to date with the new numbers, subscribe to the newsletter: I promise you won't receive more than 1 mail per month.\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nThe Results\nThis section reports (a selection of) the charts generated from the benchmark results to give you a quick glance of the capabilities of the aforementioned boards and algorithms in terms of performance and accuracy.\nIf you like an interactive view of the data, there's a Colab Notebook that reproduces the charts reported here, where you can interact with the data as you like.\nAt the very end of the article, you can also find a link to the raw CSV file I generated (as you can see, it required A LOT of work to create).\nAccuracy\nThe overall accuracy of each classifier on each dataset (this plot is not bounded to any particular board, it is computed "offline").\n\nComment: many classifiers (Random Forest, XGBoost, Logistic Regression) can easily achieve up to 95+ % accuracy on some datasets with minimal pre-processing, while still scoring 85+ % on more difficult datasets.\nFlash percent\nThese charts plot, for each dataset, how much flash (in percent on the total available) it takes for the classifier to compile (visit the Colab Notebook to see all the charts).\n\n\nComment: DecisionTree, GaussianNB and Logistic Regression require the least amount of flash. XGBoost is very "flash-intensive"; RandomForest sits in the middle.\nHow tiny can TinyML be?\nAs low as 6% of flash size for a fully functional DecisionTree with 85+% accuracy.\nInference time\nThese charts plot, for each dataset, how long it takes for the classifier to run (only the classification, no feature extraction!).\n\n\nComment: DecisionTree is the clear winner here, with minimal inference time (from 0.4 to 30 microseconds), followed by Random Forest. Logistic Regression, XGBoost and GaussianNB are the slowest.\nHow fast can TinyML be?\nAs fast as sub-millisecond inference time for a fully functional DecisionTree with 85+% accuracy.\nInference time vs Accuracy\nThis plot correlates the inference time vs the classification accuracy. The more upper-left a point is, the better (fast inference time, high accuracy).\nClick here to open the image at full size\n\nComment: as already stated, you will see a lot of blue markers (Decision Tree) in the top left, since it is very fast and quite accurate. Moving to the right you can see purple (Logistic Regression) and orange (Random Forest). GaussianNB (red) exhibits quite low accuracy instead.\nInference time vs Flash percent\nThis plot correlates the inference time vs the the (relative) flash requirement. The more lower-left a point is, the better (fast inference time, low flash requirements).\nClick here to open the image at full size\n\nComment: Again, we see blue (Decision Tree) is both fast and small, followed by Logistic Regression and Random Forest. Now it is clear that XGBoost (green), while not being the slowest, is the more demanding in terms of flash.\nConclusions\nI hope this post helped you broaden your view on TinyML, on how tiny it can be, how fast it can be (sub-millisecond inference!), how wide it is.\nPlease don't hesitate to comment with your opinion on the subject, suggestions of new boards or datasets I should benchmark, or any other idea you have in mind that can contribute to the purpose of this page.\nAnd don't forget to stay tuned for the updates: I already have 2 more boards I will benchmark in the next days!\n\nAs promised, here's the link to the raw benchmarks in CSV format.\nYou can run your own analysis and visualization on it: if you use it in your own work, please add a link to this post.\nIn future posts I will share how I collected all those numbers, so subscribe to the newsletter to stay up to date!\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nL'articolo The Grand Benchmark Table of Embedded Machine Learning proviene da Eloquent Arduino Blog.", "date_published": "2020-12-16T21:31:10+01:00", "date_modified": "2020-12-20T17:13:28+01:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1264", "url": "https://eloquentarduino.github.io/2020/10/decision-tree-random-forest-and-xgboost-on-arduino/", "title": "Decision Tree, Random Forest and XGBoost on Arduino", "content_html": "You will be surprised by how much accuracy you can achieve in just a few kylobytes of resources: Decision Tree, Random Forest and XGBoost (Extreme Gradient Boosting) are now available on your microcontrollers: highly RAM-optmized implementations for super-fast classification on embedded devices.
\n\n\n
Decision Tree is without doubt one of the most well-known classification algorithms out there. It is so simple to understand that it was probably the first classifier you encountered in any Machine Learning course.
\nI won't go into the details of how a Decision Tree classifier trains and selects the splits for the input features: here I will explain how a RAM-efficient porting of such a classifier is implemented.
\nTo an introduction visit Wikipedia; for a more in-depth guide visit KDNuggets.
\nSince we're willing to sacrifice program space (a.k.a flash) in favor of memory (a.k.a RAM), because RAM is the most scarce resource in the vast majority of microcontrollers, the smart way to port a Decision Tree classifier from Python to C is "hard-coding" the splits in code, without keeping any reference to them into variables.
\nHere's what it looks like for a Decision tree that classifies the Iris dataset.
\nAs you can see, we're using 0 bytes of RAM to get the classification result, since no variable is being allocated. On the other side, the program space will grow almost linearly with the number of splits.
\nSince program space is often much greater than RAM on microcontrollers, this implementation exploits its abundance to be able to deploy larger models. How much large? It will depend on the flash size available: many new generations board (Arduino Nano 33 BLE Sense, ESP32, ST Nucleus...) have 1 Mb of flash, which will hold tens of thousands of splits.
\nRandom Forest is just many Decision Trees joined together in a voting scheme. The core idea is that of "the wisdom of the corwd", such that if many trees vote for a given class (having being trained on different subsets of the training set), that class is probably the true class.
\nTowards Data Science has a more detailed guide on Random Forest and how it balances the trees with thebagging tecnique.
\nAs easy as Decision Trees, Random Forest gets the exact same implementation with 0 bytes of RAM required (it actually needs as many bytes as the number of classes to store the votes, but that's really negligible): it just hard-codes all its composing trees.
\nExtreme Gradient Boosting is "Gradient Boosting on steroids" and has gained much attention from the Machine learning community due to its top results in many data competitions.
\nYou can read the original paper about XGBoost here. For a discursive description head to KDNuggets, if you want some more math refer to this blog post on Medium.
\nIf you followed my earlier posts on Gaussian Naive Bayes, SEFR, Relevant Vector Machine and Support Vector Machines, you already know how to port these new classifiers.
\nIf you're new, you will need a couple things:
\npip install micromlgen
\npip install xgboost
\nmicromlgen.port
function to generate your plain C codefrom micromlgen import port\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn.datasets import load_iris\n\nclf = DecisionTreeClassifier()\nX, y = load_iris(return_X_y=True)\nclf.fit(X, y)\nprint(port(clf))
\nYou can then copy-past the C code and import it in your sketch.
\nOnce you have the classifier code, create a new project named TreeClassifierExample
and copy the classifier code into a file named DecisionTree.h
(or RandomForest.h
or XGBoost.h
depending on the model you chose).
The copy the following to the main ino file.
\n#include "DecisionTree.h"\n\nEloquent::ML::Port::DecisionTree clf;\n\nvoid setup() {\n Serial.begin(115200);\n Serial.println("Begin");\n}\n\nvoid loop() {\n float irisSample[4] = {6.2, 2.8, 4.8, 1.8};\n\n Serial.print("Predicted label (you should see '2': ");\n Serial.println(clf.predict(irisSample));\n delay(1000);\n}
\nHow do the 3 classifiers compare against each other?
\nWe will evaluate a few keypoints:
\nfor each classifier on a variety of datasets. I will report the results for RAM and Flash on the Arduino Nano old generation, so you should consider more the relative figures than the absolute ones.
\nDataset | \nClassifier | \nTraining time (s) | \nAccuracy | \nRAM (bytes) | \nFlash (bytes) | \n
---|---|---|---|---|---|
Gas Sensor Array Drift Dataset | \nDecision Tree | \n1,6 | \n0.781 \u00b1 0.12 | \n290 | \n5722 | \n
13910 samples x 128 features | \nRandom Forest | \n3 | \n0.865 \u00b1 0.083 | \n290 | \n6438 | \n
6 classes | \nXGBoost | \n18,8 | \n0.878 \u00b1 0.074 | \n290 | \n6506 | \n
Gesture Phase Segmentation Dataset | \nDecision Tree | \n0,1 | \n0.943 \u00b1 0.005 | \n290 | \n5638 | \n
10000 samples x 19 features | \nRandom Forest | \n0,7 | \n0.970 \u00b1 0.004 | \n306 | \n6466 | \n
5 classes | \nXGBoost | \n18,9 | \n0.969 \u00b1 0.003 | \n306 | \n6536 | \n
Drive Diagnosis Dataset | \nDecision Tree | \n0,6 | \n0.946 \u00b1 0.005 | \n306 | \n5850 | \n
10000 samples x 48 features | \nRandom Forest | \n2,6 | \n0.983 \u00b1 0.003 | \n306 | \n6526 | \n
11 classes | \nXGBoost | \n68,9 | \n0.977 \u00b1 0.005 | \n306 | \n6698 | \n
* all datasets are taken from the UCI Machine Learning datasets archive
\nI'm collecting more data for a complete benchmark, but in the meantime you can see that both Random Forest and XGBoost are on par: if not that XGBoost takes 5 to 25 times longer to train.
\nI've never used XGBoost, so I may be missing some tuning parameters, but for now Random Forest remains my favourite classifier.
\n// example IRIS dataset classification with Decision Tree\nint predict(float *x) {\n if (x[3] <= 0.800000011920929) {\n return 0;\n }\n else {\n if (x[3] <= 1.75) {\n if (x[2] <= 4.950000047683716) {\n if (x[0] <= 5.049999952316284) {\n return 1;\n }\n else {\n return 1;\n }\n }\n else {\n return 2;\n }\n }\n else {\n if (x[2] <= 4.950000047683716) {\n return 2;\n }\n else {\n return 2;\n }\n }\n }\n}
\n// example IRIS dataset classification with Random Forest of 3 trees\n\nint predict(float *x) {\n uint16_t votes[3] = { 0 };\n\n // tree #1\n if (x[0] <= 5.450000047683716) {\n if (x[1] <= 2.950000047683716) {\n votes[1] += 1;\n }\n else {\n votes[0] += 1;\n }\n }\n else {\n if (x[0] <= 6.049999952316284) {\n if (x[3] <= 1.699999988079071) {\n if (x[2] <= 3.549999952316284) {\n votes[0] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[2] += 1;\n }\n }\n else {\n if (x[3] <= 1.699999988079071) {\n if (x[3] <= 1.449999988079071) {\n if (x[0] <= 6.1499998569488525) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[2] += 1;\n }\n }\n }\n\n // tree #2\n if (x[0] <= 5.549999952316284) {\n if (x[2] <= 2.449999988079071) {\n votes[0] += 1;\n }\n else {\n if (x[2] <= 3.950000047683716) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n }\n else {\n if (x[3] <= 1.699999988079071) {\n if (x[1] <= 2.649999976158142) {\n if (x[3] <= 1.25) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n if (x[2] <= 4.1499998569488525) {\n votes[1] += 1;\n }\n else {\n if (x[0] <= 6.75) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n }\n }\n else {\n if (x[0] <= 6.0) {\n votes[2] += 1;\n }\n else {\n votes[2] += 1;\n }\n }\n }\n\n // tree #3\n if (x[3] <= 1.75) {\n if (x[2] <= 2.449999988079071) {\n votes[0] += 1;\n }\n else {\n if (x[2] <= 4.8500001430511475) {\n if (x[0] <= 5.299999952316284) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[1] += 1;\n }\n }\n }\n else {\n if (x[0] <= 5.950000047683716) {\n votes[2] += 1;\n }\n else {\n votes[2] += 1;\n }\n }\n\n // return argmax of votes\n uint8_t classIdx = 0;\n float maxVotes = votes[0];\n\n for (uint8_t i = 1; i < 3; i++) {\n if (votes[i] > maxVotes) {\n classIdx = i;\n maxVotes = votes[i];\n }\n }\n\n return classIdx;\n}
\nL'articolo Decision Tree, Random Forest and XGBoost on Arduino proviene da Eloquent Arduino Blog.
\n", "content_text": "You will be surprised by how much accuracy you can achieve in just a few kylobytes of resources: Decision Tree, Random Forest and XGBoost (Extreme Gradient Boosting) are now available on your microcontrollers: highly RAM-optmized implementations for super-fast classification on embedded devices.\n\n\nDecision Tree\nDecision Tree is without doubt one of the most well-known classification algorithms out there. It is so simple to understand that it was probably the first classifier you encountered in any Machine Learning course.\nI won't go into the details of how a Decision Tree classifier trains and selects the splits for the input features: here I will explain how a RAM-efficient porting of such a classifier is implemented.\nTo an introduction visit Wikipedia; for a more in-depth guide visit KDNuggets.\nSince we're willing to sacrifice program space (a.k.a flash) in favor of memory (a.k.a RAM), because RAM is the most scarce resource in the vast majority of microcontrollers, the smart way to port a Decision Tree classifier from Python to C is "hard-coding" the splits in code, without keeping any reference to them into variables.\nHere's what it looks like for a Decision tree that classifies the Iris dataset.\nAs you can see, we're using 0 bytes of RAM to get the classification result, since no variable is being allocated. On the other side, the program space will grow almost linearly with the number of splits.\nSince program space is often much greater than RAM on microcontrollers, this implementation exploits its abundance to be able to deploy larger models. How much large? It will depend on the flash size available: many new generations board (Arduino Nano 33 BLE Sense, ESP32, ST Nucleus...) have 1 Mb of flash, which will hold tens of thousands of splits. \nRandom Forest\nRandom Forest is just many Decision Trees joined together in a voting scheme. The core idea is that of "the wisdom of the corwd", such that if many trees vote for a given class (having being trained on different subsets of the training set), that class is probably the true class.\nTowards Data Science has a more detailed guide on Random Forest and how it balances the trees with thebagging tecnique.\nAs easy as Decision Trees, Random Forest gets the exact same implementation with 0 bytes of RAM required (it actually needs as many bytes as the number of classes to store the votes, but that's really negligible): it just hard-codes all its composing trees.\nXGBoost (Extreme Gradient Boosting)\nExtreme Gradient Boosting is "Gradient Boosting on steroids" and has gained much attention from the Machine learning community due to its top results in many data competitions.\n\n"gradient boosting" refers to the process of chaining a number of trees so that each tree tries to learn from the errors of the previous\n"extreme" refers to many software and hardware optimizations that greatly reduce the time it takes to train the model\n\nYou can read the original paper about XGBoost here. For a discursive description head to KDNuggets, if you want some more math refer to this blog post on Medium.\nPorting to plain C\nIf you followed my earlier posts on Gaussian Naive Bayes, SEFR, Relevant Vector Machine and Support Vector Machines, you already know how to port these new classifiers.\nIf you're new, you will need a couple things:\n\ninstall the micromlgen package with \n\npip install micromlgen\n\n(optionally, if you want to use Extreme Gradient Boosting) install the xgboost package with \n\npip install xgboost\n\nuse the micromlgen.port function to generate your plain C code\n\nfrom micromlgen import port\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn.datasets import load_iris\n\nclf = DecisionTreeClassifier()\nX, y = load_iris(return_X_y=True)\nclf.fit(X, y)\nprint(port(clf))\nYou can then copy-past the C code and import it in your sketch.\nUsing in the Arduino sketch\nOnce you have the classifier code, create a new project named TreeClassifierExample and copy the classifier code into a file named DecisionTree.h (or RandomForest.h or XGBoost.h depending on the model you chose).\nThe copy the following to the main ino file.\n#include "DecisionTree.h"\n\nEloquent::ML::Port::DecisionTree clf;\n\nvoid setup() {\n Serial.begin(115200);\n Serial.println("Begin");\n}\n\nvoid loop() {\n float irisSample[4] = {6.2, 2.8, 4.8, 1.8};\n\n Serial.print("Predicted label (you should see '2': ");\n Serial.println(clf.predict(irisSample));\n delay(1000);\n}\nBechmarks\nHow do the 3 classifiers compare against each other?\nWe will evaluate a few keypoints:\n\ntraining time\naccuracy\nneeded RAM\nneeded Flash\n\nfor each classifier on a variety of datasets. I will report the results for RAM and Flash on the Arduino Nano old generation, so you should consider more the relative figures than the absolute ones.\n\n\n\nDataset\nClassifier\nTraining time (s)\nAccuracy\nRAM (bytes)\nFlash (bytes)\n\n\n\n\nGas Sensor Array Drift Dataset \nDecision Tree\n1,6\n0.781 \u00b1 0.12\n290\n5722\n\n\n13910 samples x 128 features\nRandom Forest\n3\n0.865 \u00b1 0.083\n290\n6438\n\n\n6 classes\nXGBoost\n18,8\n0.878 \u00b1 0.074\n290\n6506\n\n\nGesture Phase Segmentation Dataset\nDecision Tree\n0,1\n0.943 \u00b1 0.005\n290\n5638\n\n\n10000 samples x 19 features\nRandom Forest\n0,7\n0.970 \u00b1 0.004\n306\n6466\n\n\n5 classes\nXGBoost\n18,9\n0.969 \u00b1 0.003\n306\n6536\n\n\nDrive Diagnosis Dataset\nDecision Tree\n0,6\n0.946 \u00b1 0.005\n306\n5850\n\n\n10000 samples x 48 features\nRandom Forest\n2,6\n0.983 \u00b1 0.003\n306\n6526\n\n\n11 classes\nXGBoost\n68,9\n0.977 \u00b1 0.005\n306\n6698\n\n\n\n* all datasets are taken from the UCI Machine Learning datasets archive\nI'm collecting more data for a complete benchmark, but in the meantime you can see that both Random Forest and XGBoost are on par: if not that XGBoost takes 5 to 25 times longer to train.\nI've never used XGBoost, so I may be missing some tuning parameters, but for now Random Forest remains my favourite classifier.\nCode listings\n// example IRIS dataset classification with Decision Tree\nint predict(float *x) {\n if (x[3] <= 0.800000011920929) {\n return 0;\n }\n else {\n if (x[3] <= 1.75) {\n if (x[2] <= 4.950000047683716) {\n if (x[0] <= 5.049999952316284) {\n return 1;\n }\n else {\n return 1;\n }\n }\n else {\n return 2;\n }\n }\n else {\n if (x[2] <= 4.950000047683716) {\n return 2;\n }\n else {\n return 2;\n }\n }\n }\n}\n// example IRIS dataset classification with Random Forest of 3 trees\n\nint predict(float *x) {\n uint16_t votes[3] = { 0 };\n\n // tree #1\n if (x[0] <= 5.450000047683716) {\n if (x[1] <= 2.950000047683716) {\n votes[1] += 1;\n }\n else {\n votes[0] += 1;\n }\n }\n else {\n if (x[0] <= 6.049999952316284) {\n if (x[3] <= 1.699999988079071) {\n if (x[2] <= 3.549999952316284) {\n votes[0] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[2] += 1;\n }\n }\n else {\n if (x[3] <= 1.699999988079071) {\n if (x[3] <= 1.449999988079071) {\n if (x[0] <= 6.1499998569488525) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[2] += 1;\n }\n }\n }\n\n // tree #2\n if (x[0] <= 5.549999952316284) {\n if (x[2] <= 2.449999988079071) {\n votes[0] += 1;\n }\n else {\n if (x[2] <= 3.950000047683716) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n }\n else {\n if (x[3] <= 1.699999988079071) {\n if (x[1] <= 2.649999976158142) {\n if (x[3] <= 1.25) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n if (x[2] <= 4.1499998569488525) {\n votes[1] += 1;\n }\n else {\n if (x[0] <= 6.75) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n }\n }\n else {\n if (x[0] <= 6.0) {\n votes[2] += 1;\n }\n else {\n votes[2] += 1;\n }\n }\n }\n\n // tree #3\n if (x[3] <= 1.75) {\n if (x[2] <= 2.449999988079071) {\n votes[0] += 1;\n }\n else {\n if (x[2] <= 4.8500001430511475) {\n if (x[0] <= 5.299999952316284) {\n votes[1] += 1;\n }\n else {\n votes[1] += 1;\n }\n }\n else {\n votes[1] += 1;\n }\n }\n }\n else {\n if (x[0] <= 5.950000047683716) {\n votes[2] += 1;\n }\n else {\n votes[2] += 1;\n }\n }\n\n // return argmax of votes\n uint8_t classIdx = 0;\n float maxVotes = votes[0];\n\n for (uint8_t i = 1; i < 3; i++) {\n if (votes[i] > maxVotes) {\n classIdx = i;\n maxVotes = votes[i];\n }\n }\n\n return classIdx;\n}\nL'articolo Decision Tree, Random Forest and XGBoost on Arduino proviene da Eloquent Arduino Blog.", "date_published": "2020-10-19T19:31:02+02:00", "date_modified": "2020-12-10T12:26:23+01:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "ml", "Arduino Machine learning", "Arduino Machine Learning tutorial" ] }, { "id": "https://eloquentarduino.github.io/?p=1297", "url": "https://eloquentarduino.github.io/2020/09/principal-fft-components-as-efficient-features-extrator/", "title": "\u201cPrincipal\u201d FFT components as efficient features extrator", "content_html": "Fourier Transform is probably the most well known algorithm for feature extraction from time-dependent data (in particular speech data), where frequency holds a great deal of information. Sadly, computing the transform over the whole spectrum of the signal still requires O(NlogN) with the best implementation (FFT - Fast Fourier Transform); we would like to achieve faster computation on our microcontrollers.
\nIn this post I propose a partial, naive linear-time implementation of the Fourier Transform you can use to extract features from your data for Machine Learning models.
\n\n\n
DISCLAIMER
\nThe contents of this post represent my own knowledge and are not supported by any academic work (as far as I know). It may really be the case that the findings of my work don't apply to your own projects; yet, I think this idea can turn useful in solving certain kind of problems.
\nFourier transform is used to describe a signal over its entire frequency range. This is useful in a number of applications, but here we're focused on the FT for the sole purpose of extracting features to be used with Machine learning models.
\nFor this reason, we don't actually need a full description of the input signal: we're only interested in extracting some kind of signature that a ML model can use to distinguish among the different classes. Noticing that in a signal spectrum most frequencies have a low magnitude (as you can see in the picture above), the idea to only keep the most important frequencies came to my mind as a mean to speed up the computation on resource constrained microcontrollers.
\nI was thinking to a kind of PCA (Principal Component Analysis), but using FFT spectrum as features.
\nSince we will have a training set with the raw signals, we would like to select the most prominent frequencies among all the samples and apply the computation only on those: even using the naive implementation of FFT, this will yield a linear-time implementation.
\nHow does this Principal FFT compare to, let's say, PCA as a dimensionality reduction algorithm w.r.t model accuracy? Let's see the numbers!
\n\nDownload the Principal FFT benchmark spreadsheet
\nI couldn't find many examples of the kind of datasets I wished to test, but in the image you can see different types of data:
\nWe can note a couple findings:
\nFrom even this simple analysis you should be convinced that Principal FFT can be (under certain cases) a fast, performant features extractor for your projects that involve time-dependant data.
\nI created a Python package to use Principal FFT, called principal-fft
.
pip install principal-fft
\nThe class follows the Transformer
API from scikit-learn
, so it has fit
and transform
methods.
from principalfft import PrincipalFFT\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import load_digits\nfrom sklearn.ensemble import RandomForestClassifier\n\nmnist = load_digits()\nX, y = mnist.data, mnist.target\nXfft = PrincipalFFT(n_components=10).fit_transform(X)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\nXfft_train, Xfft_test, y_train, y_test = train_test_split(Xfft, y, test_size=0.3)\n\nclf = RandomForestClassifier(50, min_samples_leaf=5).fit(X_train, y_train)\nprint("Raw score", clf.score(X_test, y_test))\n\nclf = RandomForestClassifier(50, min_samples_leaf=5).fit(Xfft_train, y_train)\nprint("FFT score", clf.score(Xfft_test, y_test))
\nMy results are 0.09
for raw data and 0.78
for FFT transformed: quite a big difference!
As with any dimensionality reduction, n_components
is an hyperparameter you have to tune for your specific project: from my experiments, you shouldn't go lower than 8
to achieve a reasonable accuracy.
So, now that we tested our Principal FFT transformer in Python and achieved good results, how do we use it on our microcontroller? Of course with the micromlgen
porter: it is now (version 1.1.9
) able to port PrincipalFFT objects to plain C.
pip install micromlgen==1.1.9
\nWhat does the C code look like?
\nvoid principalFFT(float *features, float *fft) {\n // apply principal FFT (naive implementation for the top 10 frequencies only)\n const int topFrequencies[] = { 0, 8, 17, 16, 1, 9, 2, 7, 15, 6 };\n\n for (int i = 0; i < 10; i++) {\n const int k = topFrequencies[i];\n const float harmonic = 0.09817477042468103 * k;\n float re = 0;\n float im = 0;\n\n // optimized case\n if (k == 0) {\n for (int n = 0; n < 64; n++) {\n re += features[n];\n }\n }\n\n else {\n for (int n = 0; n < 64; n++) {\n const float harmonic_n = harmonic * n;\n const float cos_n = cos(harmonic_n);\n const float sin_n = sin(harmonic_n);\n re += features[n] * cos_n;\n im -= features[n] * sin_n;\n }\n }\n\n fft[i] = sqrt(re * re + im * im);\n }\n}
\nThis is the most direct porting available.
\nIn the Benchmarks section, we'll see how this implementation can be speed-up with alternative implementations.
\nThe following table reports the benchmark on the MNIST dataset (64 features) with 10 principal FFT components vs various tecniques to decrease the computation time at the expense of memory usage.
\nAlgorithm | \nFlash (Kb) | \nExecution time (micros) | \n
---|---|---|
None | \n137420 | \n- | \n
arduinoFFT library | \n147812 | \n3200 | \n
principalFFT | \n151404 | \n4400 | \n
principalFFT w/ cos+sin LUT | \n152124 | \n900 | \n
principalFFT w/ cos LUT + sin sign LUT | \n150220 | \n1250 | \n
*all the benchmarks were run on the Arduino 33 Nano BLE Sense
\nSome thoughts:
\nprincipalFFT w/ cos+sin LUT
means I pre-compute the values of sin
and cos
at compile time, so there's no computation on the board; of course these lookup tables will eat some memoryprincipalFFT w/ cos LUT + sin sign LUT
means I pre-compute the cos
values only and compute sin
using sqrt(1 - cos(x)^2)
; it adds some microseconds to the computation, but requires less memoryarduinoFFT library
is faster than principalFFT
in the execution time and requires less memory, even if principalFFT
is only computing 10 frequencies: I need to investigate how it can achieve such performancesYou can activate the LUT functionality with:
\nfrom micromlgen import port\nfrom principalfft import PrincipalFFT\n\nfft = PrincipalFFT(n_components=10).fit(X)\n\n# cos lookup, sin computed\nport(fft, lookup_cos=True)\n\n# cos + sin lookup\nport(fft, lookup_cos=True, lookup_sin=True)
\nHere's how the C code looks like with LUT.
\nvoid principalFFT(float *features, float *fft) {\n // apply principal FFT (naive implementation for the top N frequencies only)\n const int topFrequencies[] = { 0, 8, 17, 16, 1, 9, 2, 7, 15, 6 };\n const float cosLUT[10][64] = {\n { 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},\n { 1.0, 0.7071, 6.1232e-17, -0.7071, -1.0, -0.7071, -1.8369e-16, 0.7071, 1.0, 0.7071, 3.0616e-16, -0.7071, -1.0, -0.7071, -4.2862e-16, 0.7071, 1.0, 0.7071, 5.5109e-16, -0.7071, -1.0, -0.7071, -2.4499e-15, 0.7071, 1.0, 0.7071, -9.8033e-16, -0.7071, -1.0, -0.7071, -2.6948e-15, 0.7071, 1.0, 0.7071, -7.3540e-16, -0.7071, -1.0, -0.7071, -2.9397e-15, 0.7071, 1.0, 0.7071, -4.9047e-16, -0.7071, -1.0, -0.7071, -3.1847e-15, 0.7071, 1.0, 0.7071, -2.4554e-16, -0.7071, -1.0, -0.7071, -3.4296e-15, 0.7071, 1.0, 0.7071, -6.1898e-19, -0.7071, -1.0, -0.7071, -3.6745e-15, 0.7071}, ... };\n const bool sinLUT[10][64] = {\n { false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false},\n { false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, false, false, false, false, false, true, true, true, true, false, false, false}, ...};\n\n for (int i = 0; i < 10; i++) {\n const int k = topFrequencies[i];\n const float harmonic = 0.09817477042468103 * k;\n float re = 0;\n float im = 0;\n // optimized case\n if (k == 0) {\n for (int n = 0; n < 64; n++) {\n re += features[n];\n }\n }\n\n else {\n for (int n = 0; n < 64; n++) {\n const float cos_n = cosLUT[i][n];\n const float sin_n = sinLUT[i][n] ? sqrt(1 - cos_n * cos_n) : -sqrt(1 - cos_n * cos_n);\n re += features[n] * cos_n;\n im -= features[n] * sin_n;\n }\n }\n\n fft[i] = sqrt(re * re + im * im);\n }\n}
\n\r\nThis post required much work to be produced, so I hope I didn't forgot anything and you found these information useful.
\nAs always, there's a Github repo with all the code of this post.
L'articolo “Principal” FFT components as efficient features extrator proviene da Eloquent Arduino Blog.
\n", "content_text": "Fourier Transform is probably the most well known algorithm for feature extraction from time-dependent data (in particular speech data), where frequency holds a great deal of information. Sadly, computing the transform over the whole spectrum of the signal still requires O(NlogN) with the best implementation (FFT - Fast Fourier Transform); we would like to achieve faster computation on our microcontrollers.\nIn this post I propose a partial, naive linear-time implementation of the Fourier Transform you can use to extract features from your data for Machine Learning models.\n\n\nTable of contentsTraining-aware FFTAccuracy comparisonHow to use Principal FFT in PythonHow to use Principal FFT in CBenchmarking\nDISCLAIMER\nThe contents of this post represent my own knowledge and are not supported by any academic work (as far as I know). It may really be the case that the findings of my work don't apply to your own projects; yet, I think this idea can turn useful in solving certain kind of problems.\nTraining-aware FFT\nFourier transform is used to describe a signal over its entire frequency range. This is useful in a number of applications, but here we're focused on the FT for the sole purpose of extracting features to be used with Machine learning models.\nFor this reason, we don't actually need a full description of the input signal: we're only interested in extracting some kind of signature that a ML model can use to distinguish among the different classes. Noticing that in a signal spectrum most frequencies have a low magnitude (as you can see in the picture above), the idea to only keep the most important frequencies came to my mind as a mean to speed up the computation on resource constrained microcontrollers.\nI was thinking to a kind of PCA (Principal Component Analysis), but using FFT spectrum as features.\nSince we will have a training set with the raw signals, we would like to select the most prominent frequencies among all the samples and apply the computation only on those: even using the naive implementation of FFT, this will yield a linear-time implementation.\nAccuracy comparison\nHow does this Principal FFT compare to, let's say, PCA as a dimensionality reduction algorithm w.r.t model accuracy? Let's see the numbers!\n\nDownload the Principal FFT benchmark spreadsheet\nI couldn't find many examples of the kind of datasets I wished to test, but in the image you can see different types of data:\n\nhuman activity classification from smartphone data\ngesture classification by IMU data\nMNIST handwritten digits image data\nfree speech audio data\n\nWe can note a couple findings:\n\nPrincipal FFT is almost on par with PCA after a certain number of components\nPrincipalFFT definitely leaves PCA behind on audio data\n\nFrom even this simple analysis you should be convinced that Principal FFT can be (under certain cases) a fast, performant features extractor for your projects that involve time-dependant data.\nHow to use Principal FFT in Python\nI created a Python package to use Principal FFT, called principal-fft.\npip install principal-fft\nThe class follows the Transformer API from scikit-learn, so it has fit and transform methods.\nfrom principalfft import PrincipalFFT\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import load_digits\nfrom sklearn.ensemble import RandomForestClassifier\n\nmnist = load_digits()\nX, y = mnist.data, mnist.target\nXfft = PrincipalFFT(n_components=10).fit_transform(X)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\nXfft_train, Xfft_test, y_train, y_test = train_test_split(Xfft, y, test_size=0.3)\n\nclf = RandomForestClassifier(50, min_samples_leaf=5).fit(X_train, y_train)\nprint("Raw score", clf.score(X_test, y_test))\n\nclf = RandomForestClassifier(50, min_samples_leaf=5).fit(Xfft_train, y_train)\nprint("FFT score", clf.score(Xfft_test, y_test))\nMy results are 0.09 for raw data and 0.78 for FFT transformed: quite a big difference!\nAs with any dimensionality reduction, n_components is an hyperparameter you have to tune for your specific project: from my experiments, you shouldn't go lower than 8 to achieve a reasonable accuracy.\nHow to use Principal FFT in C\nSo, now that we tested our Principal FFT transformer in Python and achieved good results, how do we use it on our microcontroller? Of course with the micromlgen porter: it is now (version 1.1.9) able to port PrincipalFFT objects to plain C.\npip install micromlgen==1.1.9\nWhat does the C code look like?\nvoid principalFFT(float *features, float *fft) {\n // apply principal FFT (naive implementation for the top 10 frequencies only)\n const int topFrequencies[] = { 0, 8, 17, 16, 1, 9, 2, 7, 15, 6 };\n\n for (int i = 0; i < 10; i++) {\n const int k = topFrequencies[i];\n const float harmonic = 0.09817477042468103 * k;\n float re = 0;\n float im = 0;\n\n // optimized case\n if (k == 0) {\n for (int n = 0; n < 64; n++) {\n re += features[n];\n }\n }\n\n else {\n for (int n = 0; n < 64; n++) {\n const float harmonic_n = harmonic * n;\n const float cos_n = cos(harmonic_n);\n const float sin_n = sin(harmonic_n);\n re += features[n] * cos_n;\n im -= features[n] * sin_n;\n }\n }\n\n fft[i] = sqrt(re * re + im * im);\n }\n}\nThis is the most direct porting available.\nIn the Benchmarks section, we'll see how this implementation can be speed-up with alternative implementations.\nBenchmarking\nThe following table reports the benchmark on the MNIST dataset (64 features) with 10 principal FFT components vs various tecniques to decrease the computation time at the expense of memory usage.\n\n\n\nAlgorithm\nFlash (Kb)\nExecution time (micros)\n\n\n\n\nNone\n137420\n-\n\n\narduinoFFT library\n147812\n3200\n\n\nprincipalFFT\n151404\n4400\n\n\nprincipalFFT w/ cos+sin LUT\n152124\n900\n\n\nprincipalFFT w/ cos LUT + sin sign LUT\n150220\n1250\n\n\n\n*all the benchmarks were run on the Arduino 33 Nano BLE Sense\nSome thoughts:\n\nprincipalFFT w/ cos+sin LUT means I pre-compute the values of sin and cos at compile time, so there's no computation on the board; of course these lookup tables will eat some memory\nprincipalFFT w/ cos LUT + sin sign LUT means I pre-compute the cos values only and compute sin using sqrt(1 - cos(x)^2); it adds some microseconds to the computation, but requires less memory\narduinoFFT library is faster than principalFFT in the execution time and requires less memory, even if principalFFT is only computing 10 frequencies: I need to investigate how it can achieve such performances\n\nYou can activate the LUT functionality with:\nfrom micromlgen import port\nfrom principalfft import PrincipalFFT\n\nfft = PrincipalFFT(n_components=10).fit(X)\n\n# cos lookup, sin computed\nport(fft, lookup_cos=True)\n\n# cos + sin lookup\nport(fft, lookup_cos=True, lookup_sin=True)\nHere's how the C code looks like with LUT.\nvoid principalFFT(float *features, float *fft) {\n // apply principal FFT (naive implementation for the top N frequencies only)\n const int topFrequencies[] = { 0, 8, 17, 16, 1, 9, 2, 7, 15, 6 };\n const float cosLUT[10][64] = {\n { 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},\n { 1.0, 0.7071, 6.1232e-17, -0.7071, -1.0, -0.7071, -1.8369e-16, 0.7071, 1.0, 0.7071, 3.0616e-16, -0.7071, -1.0, -0.7071, -4.2862e-16, 0.7071, 1.0, 0.7071, 5.5109e-16, -0.7071, -1.0, -0.7071, -2.4499e-15, 0.7071, 1.0, 0.7071, -9.8033e-16, -0.7071, -1.0, -0.7071, -2.6948e-15, 0.7071, 1.0, 0.7071, -7.3540e-16, -0.7071, -1.0, -0.7071, -2.9397e-15, 0.7071, 1.0, 0.7071, -4.9047e-16, -0.7071, -1.0, -0.7071, -3.1847e-15, 0.7071, 1.0, 0.7071, -2.4554e-16, -0.7071, -1.0, -0.7071, -3.4296e-15, 0.7071, 1.0, 0.7071, -6.1898e-19, -0.7071, -1.0, -0.7071, -3.6745e-15, 0.7071}, ... };\n const bool sinLUT[10][64] = {\n { false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false},\n { false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, true, false, false, false, false, true, true, true, false, false, false, false, false, true, true, true, true, false, false, false}, ...};\n\n for (int i = 0; i < 10; i++) {\n const int k = topFrequencies[i];\n const float harmonic = 0.09817477042468103 * k;\n float re = 0;\n float im = 0;\n // optimized case\n if (k == 0) {\n for (int n = 0; n < 64; n++) {\n re += features[n];\n }\n }\n\n else {\n for (int n = 0; n < 64; n++) {\n const float cos_n = cosLUT[i][n];\n const float sin_n = sinLUT[i][n] ? sqrt(1 - cos_n * cos_n) : -sqrt(1 - cos_n * cos_n);\n re += features[n] * cos_n;\n im -= features[n] * sin_n;\n }\n }\n\n fft[i] = sqrt(re * re + im * im);\n }\n}\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\n\nThis post required much work to be produced, so I hope I didn't forgot anything and you found these information useful.\nAs always, there's a Github repo with all the code of this post.\nL'articolo “Principal” FFT components as efficient features extrator proviene da Eloquent Arduino Blog.", "date_published": "2020-09-05T10:52:02+02:00", "date_modified": "2020-09-05T17:14:34+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1282", "url": "https://eloquentarduino.github.io/2020/08/better-word-classification-with-arduino-33-ble-sense-and-machine-learning/", "title": "Better word classification with Arduino Nano 33 BLE Sense and Machine Learning", "content_html": "Let's revamp the post I wrote about word classification using Machine Learning on Arduino, this time using a proper microphone (the MP34DT05 mounted on the Arduino Nano 33 BLE Sense) instead of a chinese, analog one: will the results improve?
\n \n\n
Updated on 16 October 2020: step by step explanation of the process with ready-made sketch code
\nThis tutorial will teach you how to capture audio from the Arduino Nano 33 BLE Sense microphone and classify it: at the end of this post, you will have a trained model able to detect in real-time the word you tell, among the ones that you trained it to recognize. The classification will occur directly on your Arduino board.
\nThis is not a general-purpose speech recognizer able to convert speech-to-text: it works only on the words you train it on.
\nHardware
\n\nSoftware
\nTo install the software, open your terminal and install the libraries.
\npip install -U scikit-learn\npip install -U micromlgen
\nFirst of all, we need to capture a bunch of examples of the words we want to recognize.
\nIn the original post, we used an analog microphone to record the audio. It is for sure the easiest way to interact with audio on a microcontroller since you only need to analogRead()
the selected pin to get a value from the sensor.
This semplicity, however, comes at the cost of a nearly inexistent signal pre-processing from the sensor itself: most of the time, you will get junk - I don't want to be rude, but that's it.
\nThe microphone mounted on the Arduino Nano 33 BLE Sense (the MP34DT05) is fortunately much better than this: it gives you access to a modulated signal much more suitable for our processing needs.
\nThe modulation used is pulse-density: I won't try to explain you how this works since I'm not an expert in DSP and neither it is the main scope of this article (refer to Wikipedia for some more information).
\nWhat matters to us is that we can grab an array of bytes from the microphone and extract its Root Mean Square (a.k.a. RMS) to be used as a feature for our Machine Learning model.
\nI had some difficulty finding examples on how to access the microphone on the Arduino Nano 33 BLE Sense board: fortunately, there's a Github repo from DelaGia that shows how to access all the sensors of the board.
\nI extracted the microphone part and incapsulated it in an easy to use class, so you don't really need to dig into the implementation details if you're not interested.
\nWhen loaded on your Arduino Nano 33 BLE Sense, the following sketch will await for you to speak in front of the microphone: once it detects a sound, it will record 64 audio values and print them to the serial monitor.
\nFrom my experience, 64 samples are sufficient to cover short words such as yes, no, play, stop: if you plan to classify longer words, you may need to increase this number.
\nDownload the Arduino Nano 33 BLE Sense - Capture audio samples sketch, open it the Arduino IDE and flash it to your board.
\nHere's the main code.
\n#include "Mic.h"\n\n// tune as per your needs\n#define SAMPLES 64\n#define GAIN (1.0f/50)\n#define SOUND_THRESHOLD 2000\n\nfloat features[SAMPLES];\nMic mic;\n\nvoid setup() {\n Serial.begin(115200);\n PDM.onReceive(onAudio);\n mic.begin();\n delay(3000);\n}\n\nvoid loop() {\n // await for a word to be pronounced\n if (recordAudioSample()) {\n // print features to serial monitor\n for (int i = 0; i < SAMPLES; i++) {\n Serial.print(features[i], 6);\n Serial.print(i == SAMPLES - 1 ? '\\n' : ',');\n }\n\n delay(1000);\n }\n\n delay(20);\n}\n\n/**\n * PDM callback to update mic object\n */\nvoid onAudio() {\n mic.update();\n}\n\n/**\n * Read given number of samples from mic\n */\nbool recordAudioSample() {\n if (mic.hasData() && mic.data() > SOUND_THRESHOLD) {\n\n for (int i = 0; i < SAMPLES; i++) {\n while (!mic.hasData())\n delay(1);\n\n features[i] = mic.pop() * GAIN;\n }\n\n return true;\n }\n\n return false;\n}
\nNow that we have the acquisition logic in place, it's time for you to record some samples of the words you want to classify.
\nNow you have to capture as many samples of the words you want to classify as possible.
\nOpen the serial monitor and pronounce a word near the microphone: a line of numbers will be printed on the monitor.
\nThis is the description of your word.
\nYou need many lines like this for an accurate prediction, so keep repeating the same word 15-30 times.
\nAfter you repeated the same words many times, copy the content of the serial monitor and save it in a CSV file named after the word, for example yes.csv
.
Then clear the serial monitor and repeat the process for each word.
\nKeep all these files in a folder because we need them to train our classifier.
\nNow that we have the samples, it's time to train the classifier.
\nCreate a Python project in your favourite IDE or use your favourite text editor, if you don't have one.
\nAs described in my post about how to train a classifier, we create a Python script that reads all the files inside a folder and concatenates them in a single array you feed to the classifier model.
\nBe sure your folder structure is like the following:
\nArduinoWordClassification\n |-- train_classifier.py\n |-- data/\n |---- yes.csv\n |---- no.csv\n |---- play.csv\n |---- any other .csv file you recorded
\n# file: train_classifier.py\n\nimport numpy as np\nfrom os.path import basename\nfrom glob import glob\nfrom sklearn.svm import SVC\nfrom micromlgen import port\nfrom sklearn.model_selection import train_test_split\n\ndef load_features(folder):\n dataset = None\n classmap = {}\n for class_idx, filename in enumerate(glob('%s/*.csv' % folder)):\n class_name = basename(filename)[:-4]\n classmap[class_idx] = class_name\n samples = np.loadtxt(filename, dtype=float, delimiter=',')\n labels = np.ones((len(samples), 1)) * class_idx\n samples = np.hstack((samples, labels))\n dataset = samples if dataset is None else np.vstack((dataset, samples))\n return dataset, classmap\n\nnp.random.seed(0)\ndataset, classmap = load_features('data')\nX, y = dataset[:, :-1], dataset[:, -1]\n# this line is for testing your accuracy only: once you're satisfied with the results, set test_size to 1\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n\nclf = SVC(kernel='poly', degree=2, gamma=0.1, C=100)\nclf.fit(X_train, y_train)\n\nprint('Accuracy', clf.score(X_test, y_test))\nprint('Exported classifier to plain C')\nprint(port(clf, classmap=classmap))
\nAmong the classifiers I tried, SVM produced the best accuracy at 96% with 32 support vectors: it's not a super-tiny model, but it's quite small nevertheless.
\nIf you're not satisifed with SVM, you can use Decision Tree, Random Forest, Gaussian Naive Bayes, Relevant Vector Machines. See my other posts for a detailed description of each.
\nIn your console, after the accuracy score, you will have the plain C implementation of the classifier you trained. The following reports my SVM model.
\n// File: Classifier.h\n\n#pragma once\nnamespace Eloquent {\n namespace ML {\n namespace Port {\n class SVM {\n public:\n /**\n * Predict class for features vector\n */\n int predict(float *x) {\n float kernels[35] = { 0 };\n float decisions[6] = { 0 };\n int votes[4] = { 0 };\n kernels[0] = compute_kernel(x, 33.0 , 41.0 , 47.0 , 54.0 , 59.0 , 61.0 , 56.0 , 51.0 , 50.0 , 51.0 , 44.0 , 32.0 , 23.0 , 15.0 , 12.0 , 8.0 , 5.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 5.0 , 3.0 , 5.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 );\n kernels[1] = compute_kernel(x, 40.0 , 50.0 , 51.0 , 60.0 , 56.0 , 57.0 , 58.0 , 53.0 , 50.0 , 45.0 , 42.0 , 34.0 , 23.0 , 16.0 , 10.0 , 7.0 , 3.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 14.0 , 3.0 , 8.0 , 0.0 , 0.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 3.0 , 0.0 , 0.0 , 5.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 3.0 , 0.0 , 5.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 3.0 , 0.0 , 0.0 , 0.0 , 3.0 );\n kernels[2] = compute_kernel(x, 56.0 , 68.0 , 78.0 , 91.0 , 84.0 , 84.0 , 84.0 , 74.0 , 69.0 , 64.0 , 57.0 , 44.0 , 33.0 , 18.0 , 12.0 , 8.0 , 5.0 , 9.0 , 15.0 , 12.0 , 12.0 , 9.0 , 12.0 , 7.0 , 3.0 , 10.0 , 12.0 , 6.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 6.0 , 3.0 , 6.0 , 10.0 , 10.0 , 8.0 , 3.0 , 9.0 , 9.0 , 9.0 , 8.0 , 9.0 , 9.0 , 11.0 , 3.0 , 8.0 , 9.0 , 8.0 , 8.0 , 8.0 , 6.0 , 7.0 , 3.0 , 3.0 , 8.0 , 5.0 , 3.0 , 0.0 , 3.0 , 0.0 , 0.0 );\n\n // ...many other kernels computations...\n\n decisions[0] = 0.722587775297\n + kernels[1] * 3.35855e-07\n + kernels[2] * 1.64612e-07\n + kernels[4] * 6.00056e-07\n + kernels[5] * 3.5195e-08\n + kernels[7] * -4.2079e-08\n + kernels[8] * -4.2843e-08\n + kernels[9] * -9.994e-09\n + kernels[10] * -5.11065e-07\n + kernels[11] * -5.979e-09\n + kernels[12] * -4.4672e-08\n + kernels[13] * -1.5606e-08\n + kernels[14] * -1.2941e-08\n + kernels[15] * -2.18903e-07\n + kernels[17] * -2.31635e-07\n ;\n decisions[1] = -1.658344586719\n + kernels[0] * 2.45018e-07\n + kernels[1] * 4.30223e-07\n + kernels[3] * 1.00277e-07\n + kernels[4] * 2.16524e-07\n + kernels[18] * -4.81187e-07\n + kernels[20] * -5.10856e-07\n ;\n decisions[2] = -1.968607562265\n + kernels[0] * 3.001833e-06\n + kernels[3] * 4.5201e-08\n + kernels[4] * 1.54493e-06\n + kernels[5] * 2.81834e-07\n + kernels[25] * -5.93581e-07\n + kernels[26] * -2.89779e-07\n + kernels[27] * -1.73958e-06\n + kernels[28] * -1.09552e-07\n + kernels[30] * -3.09126e-07\n + kernels[31] * -1.294219e-06\n + kernels[32] * -5.37961e-07\n ;\n decisions[3] = -0.720663029823\n + kernels[6] * 1.4362e-08\n + kernels[7] * 6.177e-09\n + kernels[9] * 1.25e-08\n + kernels[10] * 2.05478e-07\n + kernels[12] * 2.501e-08\n + kernels[15] * 4.363e-07\n + kernels[16] * 9.147e-09\n + kernels[18] * -1.82182e-07\n + kernels[20] * -4.93707e-07\n + kernels[21] * -3.3084e-08\n ;\n decisions[4] = -1.605747746589\n + kernels[6] * 6.182e-09\n + kernels[7] * 1.3853e-08\n + kernels[8] * 2.12e-10\n + kernels[9] * 1.1243e-08\n + kernels[10] * 7.80681e-07\n + kernels[15] * 8.347e-07\n + kernels[17] * 1.64985e-07\n + kernels[23] * -4.25014e-07\n + kernels[25] * -1.134803e-06\n + kernels[34] * -2.52038e-07\n ;\n decisions[5] = -0.934328303475\n + kernels[19] * 3.3529e-07\n + kernels[20] * 1.121946e-06\n + kernels[21] * 3.44683e-07\n + kernels[22] * -6.23056e-07\n + kernels[24] * -1.4612e-07\n + kernels[28] * -1.24025e-07\n + kernels[29] * -4.31701e-07\n + kernels[31] * -9.2146e-08\n + kernels[33] * -3.8487e-07\n ;\n votes[decisions[0] > 0 ? 0 : 1] += 1;\n votes[decisions[1] > 0 ? 0 : 2] += 1;\n votes[decisions[2] > 0 ? 0 : 3] += 1;\n votes[decisions[3] > 0 ? 1 : 2] += 1;\n votes[decisions[4] > 0 ? 1 : 3] += 1;\n votes[decisions[5] > 0 ? 2 : 3] += 1;\n int val = votes[0];\n int idx = 0;\n\n for (int i = 1; i < 4; i++) {\n if (votes[i] > val) {\n val = votes[i];\n idx = i;\n }\n }\n\n return idx;\n }\n\n /**\n * Convert class idx to readable name\n */\n const char* predictLabel(float *x) {\n switch (predict(x)) {\n case 0:\n return "no";\n case 1:\n return "stop";\n case 2:\n return "play";\n case 3:\n return "yes";\n default:\n return "Houston we have a problem";\n }\n }\n\n protected:\n /**\n * Compute kernel between feature vector and support vector.\n * Kernel type: poly\n */\n float compute_kernel(float *x, ...) {\n va_list w;\n va_start(w, 64);\n float kernel = 0.0;\n\n for (uint16_t i = 0; i < 64; i++) {\n kernel += x[i] * va_arg(w, double);\n }\n\n return pow((0.1 * kernel) + 0.0, 2);\n }\n };\n }\n }\n}
\nNow we have all the pieces we need to perform word classification on our Arduino board.
\nDownload the Arduino Nano 33 BLE Sense - Audio classification sketch, open it in the Arduino IDE and paste the plain C code you got in the console inside the Classifier.h
file (delete all its contents before!).
Fine: it's time to deploy!
\nHit the upload button: if everything went fine, open the serial monitor and pronounce one of the words you recorded during Step 1
.
Hopefully, you will read the word on the serial monitor.
\nHere's a quick demo (please forgive me for the bad video quality).
\nIf you liked this tutorial and it helped you successfully implement word classification on your Arduino Nano 33 BLE Sense, please share it on your social media so others can benefit too.
\nIf you have troubles or questions, don't hesitate to leave a comment: I will be happy to help you.
\nL'articolo Better word classification with Arduino Nano 33 BLE Sense and Machine Learning proviene da Eloquent Arduino Blog.
\n", "content_text": "Let's revamp the post I wrote about word classification using Machine Learning on Arduino, this time using a proper microphone (the MP34DT05 mounted on the Arduino Nano 33 BLE Sense) instead of a chinese, analog one: will the results improve?\nfrom https://www.udemy.com/course/learn-audio-processing-complete-engineers-course/\n\nUpdated on 16 October 2020: step by step explanation of the process with ready-made sketch code\nTable of contentsWhat you'll learnWhat you'll needStep 1. Capture audio samplesTheory: Pulse-density modulation (a.k.a. PDM)Practice: the code to capture the samplesAction: capture the words examplesStep 2. Train the machine learning modelStep 3. Deploy to your microcontroller\nWhat you'll learn\nThis tutorial will teach you how to capture audio from the Arduino Nano 33 BLE Sense microphone and classify it: at the end of this post, you will have a trained model able to detect in real-time the word you tell, among the ones that you trained it to recognize. The classification will occur directly on your Arduino board.\nThis is not a general-purpose speech recognizer able to convert speech-to-text: it works only on the words you train it on.\nWhat you'll need\n\n\nHardware\n\nArduino Nano 33 BLE Sense\n\n\n\nSoftware\n\nPython\nPython's module scikit-learn\nPython's module micromlgen\n\n\n\nTo install the software, open your terminal and install the libraries.\npip install -U scikit-learn\npip install -U micromlgen\nStep 1. Capture audio samples\nFirst of all, we need to capture a bunch of examples of the words we want to recognize.\nIn the original post, we used an analog microphone to record the audio. It is for sure the easiest way to interact with audio on a microcontroller since you only need to analogRead() the selected pin to get a value from the sensor.\nThis semplicity, however, comes at the cost of a nearly inexistent signal pre-processing from the sensor itself: most of the time, you will get junk - I don't want to be rude, but that's it.\nTheory: Pulse-density modulation (a.k.a. PDM)\nThe microphone mounted on the Arduino Nano 33 BLE Sense (the MP34DT05) is fortunately much better than this: it gives you access to a modulated signal much more suitable for our processing needs.\nThe modulation used is pulse-density: I won't try to explain you how this works since I'm not an expert in DSP and neither it is the main scope of this article (refer to Wikipedia for some more information).\nWhat matters to us is that we can grab an array of bytes from the microphone and extract its Root Mean Square (a.k.a. RMS) to be used as a feature for our Machine Learning model.\nI had some difficulty finding examples on how to access the microphone on the Arduino Nano 33 BLE Sense board: fortunately, there's a Github repo from DelaGia that shows how to access all the sensors of the board.\nI extracted the microphone part and incapsulated it in an easy to use class, so you don't really need to dig into the implementation details if you're not interested.\nPractice: the code to capture the samples\nWhen loaded on your Arduino Nano 33 BLE Sense, the following sketch will await for you to speak in front of the microphone: once it detects a sound, it will record 64 audio values and print them to the serial monitor.\nFrom my experience, 64 samples are sufficient to cover short words such as yes, no, play, stop: if you plan to classify longer words, you may need to increase this number.\nI suggest you keep the words short: longer words will probably decrease the accuracy of the model. If you want nonetheless a longer duration, at least keep the number of words as low as possible\nDownload the Arduino Nano 33 BLE Sense - Capture audio samples sketch, open it the Arduino IDE and flash it to your board.\nHere's the main code.\n#include "Mic.h"\n\n// tune as per your needs\n#define SAMPLES 64\n#define GAIN (1.0f/50)\n#define SOUND_THRESHOLD 2000\n\nfloat features[SAMPLES];\nMic mic;\n\nvoid setup() {\n Serial.begin(115200);\n PDM.onReceive(onAudio);\n mic.begin();\n delay(3000);\n}\n\nvoid loop() {\n // await for a word to be pronounced\n if (recordAudioSample()) {\n // print features to serial monitor\n for (int i = 0; i < SAMPLES; i++) {\n Serial.print(features[i], 6);\n Serial.print(i == SAMPLES - 1 ? '\\n' : ',');\n }\n\n delay(1000);\n }\n\n delay(20);\n}\n\n/**\n * PDM callback to update mic object\n */\nvoid onAudio() {\n mic.update();\n}\n\n/**\n * Read given number of samples from mic\n */\nbool recordAudioSample() {\n if (mic.hasData() && mic.data() > SOUND_THRESHOLD) {\n\n for (int i = 0; i < SAMPLES; i++) {\n while (!mic.hasData())\n delay(1);\n\n features[i] = mic.pop() * GAIN;\n }\n\n return true;\n }\n\n return false;\n}\nNow that we have the acquisition logic in place, it's time for you to record some samples of the words you want to classify. \nAction: capture the words examples\nNow you have to capture as many samples of the words you want to classify as possible.\nOpen the serial monitor and pronounce a word near the microphone: a line of numbers will be printed on the monitor.\nThis is the description of your word.\nYou need many lines like this for an accurate prediction, so keep repeating the same word 15-30 times.\n**My advice**: while recording the samples, vary both the distance of your mounth from the mic and the intensity of your voice: this will produce a more robust classification model later on.\nAfter you repeated the same words many times, copy the content of the serial monitor and save it in a CSV file named after the word, for example yes.csv.\nThen clear the serial monitor and repeat the process for each word.\nKeep all these files in a folder because we need them to train our classifier.\nStep 2. Train the machine learning model\nNow that we have the samples, it's time to train the classifier.\nCreate a Python project in your favourite IDE or use your favourite text editor, if you don't have one.\nAs described in my post about how to train a classifier, we create a Python script that reads all the files inside a folder and concatenates them in a single array you feed to the classifier model.\nBe sure your folder structure is like the following:\nArduinoWordClassification\n |-- train_classifier.py\n |-- data/\n |---- yes.csv\n |---- no.csv\n |---- play.csv\n |---- any other .csv file you recorded\n# file: train_classifier.py\n\nimport numpy as np\nfrom os.path import basename\nfrom glob import glob\nfrom sklearn.svm import SVC\nfrom micromlgen import port\nfrom sklearn.model_selection import train_test_split\n\ndef load_features(folder):\n dataset = None\n classmap = {}\n for class_idx, filename in enumerate(glob('%s/*.csv' % folder)):\n class_name = basename(filename)[:-4]\n classmap[class_idx] = class_name\n samples = np.loadtxt(filename, dtype=float, delimiter=',')\n labels = np.ones((len(samples), 1)) * class_idx\n samples = np.hstack((samples, labels))\n dataset = samples if dataset is None else np.vstack((dataset, samples))\n return dataset, classmap\n\nnp.random.seed(0)\ndataset, classmap = load_features('data')\nX, y = dataset[:, :-1], dataset[:, -1]\n# this line is for testing your accuracy only: once you're satisfied with the results, set test_size to 1\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n\nclf = SVC(kernel='poly', degree=2, gamma=0.1, C=100)\nclf.fit(X_train, y_train)\n\nprint('Accuracy', clf.score(X_test, y_test))\nprint('Exported classifier to plain C')\nprint(port(clf, classmap=classmap))\nAmong the classifiers I tried, SVM produced the best accuracy at 96% with 32 support vectors: it's not a super-tiny model, but it's quite small nevertheless.\nIf you're not satisifed with SVM, you can use Decision Tree, Random Forest, Gaussian Naive Bayes, Relevant Vector Machines. See my other posts for a detailed description of each.\nIn your console, after the accuracy score, you will have the plain C implementation of the classifier you trained. The following reports my SVM model.\n// File: Classifier.h\n\n#pragma once\nnamespace Eloquent {\n namespace ML {\n namespace Port {\n class SVM {\n public:\n /**\n * Predict class for features vector\n */\n int predict(float *x) {\n float kernels[35] = { 0 };\n float decisions[6] = { 0 };\n int votes[4] = { 0 };\n kernels[0] = compute_kernel(x, 33.0 , 41.0 , 47.0 , 54.0 , 59.0 , 61.0 , 56.0 , 51.0 , 50.0 , 51.0 , 44.0 , 32.0 , 23.0 , 15.0 , 12.0 , 8.0 , 5.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 5.0 , 3.0 , 5.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 );\n kernels[1] = compute_kernel(x, 40.0 , 50.0 , 51.0 , 60.0 , 56.0 , 57.0 , 58.0 , 53.0 , 50.0 , 45.0 , 42.0 , 34.0 , 23.0 , 16.0 , 10.0 , 7.0 , 3.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 14.0 , 3.0 , 8.0 , 0.0 , 0.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 3.0 , 0.0 , 0.0 , 5.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 3.0 , 0.0 , 5.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 3.0 , 0.0 , 0.0 , 0.0 , 3.0 );\n kernels[2] = compute_kernel(x, 56.0 , 68.0 , 78.0 , 91.0 , 84.0 , 84.0 , 84.0 , 74.0 , 69.0 , 64.0 , 57.0 , 44.0 , 33.0 , 18.0 , 12.0 , 8.0 , 5.0 , 9.0 , 15.0 , 12.0 , 12.0 , 9.0 , 12.0 , 7.0 , 3.0 , 10.0 , 12.0 , 6.0 , 3.0 , 0.0 , 0.0 , 0.0 , 0.0 , 6.0 , 3.0 , 6.0 , 10.0 , 10.0 , 8.0 , 3.0 , 9.0 , 9.0 , 9.0 , 8.0 , 9.0 , 9.0 , 11.0 , 3.0 , 8.0 , 9.0 , 8.0 , 8.0 , 8.0 , 6.0 , 7.0 , 3.0 , 3.0 , 8.0 , 5.0 , 3.0 , 0.0 , 3.0 , 0.0 , 0.0 );\n\n // ...many other kernels computations...\n\n decisions[0] = 0.722587775297\n + kernels[1] * 3.35855e-07\n + kernels[2] * 1.64612e-07\n + kernels[4] * 6.00056e-07\n + kernels[5] * 3.5195e-08\n + kernels[7] * -4.2079e-08\n + kernels[8] * -4.2843e-08\n + kernels[9] * -9.994e-09\n + kernels[10] * -5.11065e-07\n + kernels[11] * -5.979e-09\n + kernels[12] * -4.4672e-08\n + kernels[13] * -1.5606e-08\n + kernels[14] * -1.2941e-08\n + kernels[15] * -2.18903e-07\n + kernels[17] * -2.31635e-07\n ;\n decisions[1] = -1.658344586719\n + kernels[0] * 2.45018e-07\n + kernels[1] * 4.30223e-07\n + kernels[3] * 1.00277e-07\n + kernels[4] * 2.16524e-07\n + kernels[18] * -4.81187e-07\n + kernels[20] * -5.10856e-07\n ;\n decisions[2] = -1.968607562265\n + kernels[0] * 3.001833e-06\n + kernels[3] * 4.5201e-08\n + kernels[4] * 1.54493e-06\n + kernels[5] * 2.81834e-07\n + kernels[25] * -5.93581e-07\n + kernels[26] * -2.89779e-07\n + kernels[27] * -1.73958e-06\n + kernels[28] * -1.09552e-07\n + kernels[30] * -3.09126e-07\n + kernels[31] * -1.294219e-06\n + kernels[32] * -5.37961e-07\n ;\n decisions[3] = -0.720663029823\n + kernels[6] * 1.4362e-08\n + kernels[7] * 6.177e-09\n + kernels[9] * 1.25e-08\n + kernels[10] * 2.05478e-07\n + kernels[12] * 2.501e-08\n + kernels[15] * 4.363e-07\n + kernels[16] * 9.147e-09\n + kernels[18] * -1.82182e-07\n + kernels[20] * -4.93707e-07\n + kernels[21] * -3.3084e-08\n ;\n decisions[4] = -1.605747746589\n + kernels[6] * 6.182e-09\n + kernels[7] * 1.3853e-08\n + kernels[8] * 2.12e-10\n + kernels[9] * 1.1243e-08\n + kernels[10] * 7.80681e-07\n + kernels[15] * 8.347e-07\n + kernels[17] * 1.64985e-07\n + kernels[23] * -4.25014e-07\n + kernels[25] * -1.134803e-06\n + kernels[34] * -2.52038e-07\n ;\n decisions[5] = -0.934328303475\n + kernels[19] * 3.3529e-07\n + kernels[20] * 1.121946e-06\n + kernels[21] * 3.44683e-07\n + kernels[22] * -6.23056e-07\n + kernels[24] * -1.4612e-07\n + kernels[28] * -1.24025e-07\n + kernels[29] * -4.31701e-07\n + kernels[31] * -9.2146e-08\n + kernels[33] * -3.8487e-07\n ;\n votes[decisions[0] > 0 ? 0 : 1] += 1;\n votes[decisions[1] > 0 ? 0 : 2] += 1;\n votes[decisions[2] > 0 ? 0 : 3] += 1;\n votes[decisions[3] > 0 ? 1 : 2] += 1;\n votes[decisions[4] > 0 ? 1 : 3] += 1;\n votes[decisions[5] > 0 ? 2 : 3] += 1;\n int val = votes[0];\n int idx = 0;\n\n for (int i = 1; i < 4; i++) {\n if (votes[i] > val) {\n val = votes[i];\n idx = i;\n }\n }\n\n return idx;\n }\n\n /**\n * Convert class idx to readable name\n */\n const char* predictLabel(float *x) {\n switch (predict(x)) {\n case 0:\n return "no";\n case 1:\n return "stop";\n case 2:\n return "play";\n case 3:\n return "yes";\n default:\n return "Houston we have a problem";\n }\n }\n\n protected:\n /**\n * Compute kernel between feature vector and support vector.\n * Kernel type: poly\n */\n float compute_kernel(float *x, ...) {\n va_list w;\n va_start(w, 64);\n float kernel = 0.0;\n\n for (uint16_t i = 0; i < 64; i++) {\n kernel += x[i] * va_arg(w, double);\n }\n\n return pow((0.1 * kernel) + 0.0, 2);\n }\n };\n }\n }\n}\nStep 3. Deploy to your microcontroller\nNow we have all the pieces we need to perform word classification on our Arduino board.\nDownload the Arduino Nano 33 BLE Sense - Audio classification sketch, open it in the Arduino IDE and paste the plain C code you got in the console inside the Classifier.h file (delete all its contents before!).\nFine: it's time to deploy!\nHit the upload button: if everything went fine, open the serial monitor and pronounce one of the words you recorded during Step 1.\nHopefully, you will read the word on the serial monitor.\nHere's a quick demo (please forgive me for the bad video quality).\n\nhttps://eloquentarduino.github.io/wp-content/uploads/2020/08/Arduino-Nano-33-BLE-Sense-Word-classification.mp4\n\nIf you liked this tutorial and it helped you successfully implement word classification on your Arduino Nano 33 BLE Sense, please share it on your social media so others can benefit too.\nIf you have troubles or questions, don't hesitate to leave a comment: I will be happy to help you.\nL'articolo Better word classification with Arduino Nano 33 BLE Sense and Machine Learning proviene da Eloquent Arduino Blog.", "date_published": "2020-08-24T19:04:57+02:00", "date_modified": "2020-10-17T17:50:13+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "ml", "Arduino Machine learning" ], "attachments": [ { "url": "https://eloquentarduino.github.io/wp-content/uploads/2020/08/Arduino-Nano-33-BLE-Sense-Word-classification.mp4", "mime_type": "video/mp4", "size_in_bytes": 5594095 } ] }, { "id": "https://eloquentarduino.github.io/?p=1225", "url": "https://eloquentarduino.github.io/2020/08/eloquentml-grows-its-family-of-classifiers-gaussian-naive-bayes-on-arduino/", "title": "EloquentML grows its family of classifiers: Gaussian Naive Bayes on Arduino", "content_html": "Are you looking for a top-performer classifiers with a minimal amount of parameters to tune? Look no further: Gaussian Naive Bayes is what you're looking for. And thanks to EloquentML you can now port it to your microcontroller.
\n\n\n
Naive Bayes classifiers are simple models based on the probability theory that can be used for classification.
\nThey originate from the assumption of independence among the input variables. Even though this assumption doesn't hold true in the vast majority of the cases, they often perform very good at many classification tasks, so they're quite popular.
\nGaussian Naive Bayes stack another (mostly wrong) assumption: that the variables exhibit a Gaussian probability distribution.
\nI (and many others like me) will never understand how it is possible that so many wrong assumptions lead to such good performances!
\nNevertheless, what is important to us is that sklearn implements GaussianNB, so we easily train such a classifier.
\nThe most interesting part is that GaussianNB
can be tuned with just a single parameter: var_smoothing
.
Don't ask me what it does in theory: in practice you change it and your accuracy can boost. This leads to an easy tuning process that doesn't involves expensive grid search.
\nimport sklearn.datasets as d\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import normalize\nfrom sklearn.naive_bayes import GaussianNB\n\ndef pick_best(X_train, X_test, y_train, y_test):\n best = (None, 0)\n for var_smoothing in range(-7, 1):\n clf = GaussianNB(var_smoothing=pow(10, var_smoothing))\n clf.fit(X_train, y_train)\n y_pred = clf.predict(X_test)\n accuracy = (y_pred == y_test).sum()\n if accuracy > best[1]:\n best = (clf, accuracy)\n print('best accuracy', best[1] / len(y_test))\n return best[0]\n\niris = d.load_iris()\nX = normalize(iris.data)\ny = iris.target\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\nclf = pick_best(X_train, X_test, y_train, y_test)
\nThis simple procedure will train a bunch of classifiers with a different var_smoothing
factor and pick the best performing one.
Once you have your trained classifier, porting it to C is as easy as always:
\nfrom micromlgen import port\n\nclf = pick_best()\nprint(port(clf))
\nAlways remember to run
\npip install --upgrade micromlgen
\n\nport
is a magic method able to port many classifiers: it will automatically detect the proper converter for you.
What does the exported code looks like?
\n#pragma once\nnamespace Eloquent {\n namespace ML {\n namespace Port {\n class GaussianNB {\n public:\n /**\n * Predict class for features vector\n */\n int predict(float *x) {\n float votes[3] = { 0.0f };\n float theta[4] = { 0 };\n float sigma[4] = { 0 };\n theta[0] = 0.801139789889; theta[1] = 0.54726920354; theta[2] = 0.234408773313; theta[3] = 0.039178084094;\n sigma[0] = 0.000366881742; sigma[1] = 0.000907992556; sigma[2] = 0.000740960787; sigma[3] = 0.000274925514;\n votes[0] = 0.333333333333 - gauss(x, theta, sigma);\n theta[0] = 0.748563871324; theta[1] = 0.349390892644; theta[2] = 0.536186138345; theta[3] = 0.166747384117;\n sigma[0] = 0.000529727082; sigma[1] = 0.000847956504; sigma[2] = 0.000690057342; sigma[3] = 0.000311828658;\n votes[1] = 0.333333333333 - gauss(x, theta, sigma);\n theta[0] = 0.704497203305; theta[1] = 0.318862439835; theta[2] = 0.593755956917; theta[3] = 0.217288784452;\n sigma[0] = 0.000363782089; sigma[1] = 0.000813846722; sigma[2] = 0.000415475678; sigma[3] = 0.000758478249;\n votes[2] = 0.333333333333 - gauss(x, theta, sigma);\n // return argmax of votes\n uint8_t classIdx = 0;\n float maxVotes = votes[0];\n\n for (uint8_t i = 1; i < 3; i++) {\n if (votes[i] > maxVotes) {\n classIdx = i;\n maxVotes = votes[i];\n }\n }\n\n return classIdx;\n }\n\n protected:\n /**\n * Compute gaussian value\n */\n float gauss(float *x, float *theta, float *sigma) {\n float gauss = 0.0f;\n\n for (uint16_t i = 0; i < 4; i++) {\n gauss += log(sigma[i]);\n gauss += pow(x[i] - theta[i], 2) / sigma[i];\n }\n\n return gauss;\n }\n };\n }\n }\n }
\n\r\nAs you can see, we need a couple of "weight vectors":
\ntheta
is the mean of each featuresigma
is the standard deviationThe computation is quite thin: just a couple of operations; the class with the highest score is then selected.
\nFollowing there's a recap of a couple benchmarks I run on an Arduino Nano 33 Ble Sense.
\nClassifier | \nDataset | \nFlash | \nRAM | \nExecution time | \nAccuracy | \n
---|---|---|---|---|---|
GaussianNB | \nIris (150x4) | \n82 kb | \n42 Kb | \n65 ms | \n97% | \n
LinearSVC | \nIris (150x4) | \n83 Kb | \n42 Kb | \n76 ms | \n99% | \n
GaussianNB | \nBreast cancer (80x40) | \n90 Kb | \n42 Kb | \n160 ms | \n77% | \n
LinearSVC | \nBreast cancer (80x40) | \n112 Kb | \n42 Kb | \n378 ms | \n73% | \n
GaussianNB | \nWine (100x13) | \n85 Kb | \n42 Kb | \n130 ms | \n97% | \n
LinearSVC | \nWine (100x13) | \n89 Kb | \n42 Kb | \n125 ms | \n99% | \n
We can see that the accuracy is on par with a linear SVM, reaching up to 97% on some datasets. Its semplicity shines with high-dimensional datasets (breast cancer) where execution time is half of the LinearSVC: I can see this pattern repeating with other real-world, medium-sized datasets.
\nThis is it, you can find the example project on Github.
\nL'articolo EloquentML grows its family of classifiers: Gaussian Naive Bayes on Arduino proviene da Eloquent Arduino Blog.
\n", "content_text": "Are you looking for a top-performer classifiers with a minimal amount of parameters to tune? Look no further: Gaussian Naive Bayes is what you're looking for. And thanks to EloquentML you can now port it to your microcontroller.\n\n\n(Gaussian) Naive Bayes\nNaive Bayes classifiers are simple models based on the probability theory that can be used for classification.\nThey originate from the assumption of independence among the input variables. Even though this assumption doesn't hold true in the vast majority of the cases, they often perform very good at many classification tasks, so they're quite popular.\nGaussian Naive Bayes stack another (mostly wrong) assumption: that the variables exhibit a Gaussian probability distribution.\nI (and many others like me) will never understand how it is possible that so many wrong assumptions lead to such good performances!\nNevertheless, what is important to us is that sklearn implements GaussianNB, so we easily train such a classifier.\nThe most interesting part is that GaussianNB can be tuned with just a single parameter: var_smoothing.\nDon't ask me what it does in theory: in practice you change it and your accuracy can boost. This leads to an easy tuning process that doesn't involves expensive grid search.\nimport sklearn.datasets as d\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import normalize\nfrom sklearn.naive_bayes import GaussianNB\n\ndef pick_best(X_train, X_test, y_train, y_test):\n best = (None, 0)\n for var_smoothing in range(-7, 1):\n clf = GaussianNB(var_smoothing=pow(10, var_smoothing))\n clf.fit(X_train, y_train)\n y_pred = clf.predict(X_test)\n accuracy = (y_pred == y_test).sum()\n if accuracy > best[1]:\n best = (clf, accuracy)\n print('best accuracy', best[1] / len(y_test))\n return best[0]\n\niris = d.load_iris()\nX = normalize(iris.data)\ny = iris.target\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\nclf = pick_best(X_train, X_test, y_train, y_test)\nThis simple procedure will train a bunch of classifiers with a different var_smoothing factor and pick the best performing one.\nEloquentML integration\nOnce you have your trained classifier, porting it to C is as easy as always:\nfrom micromlgen import port\n\nclf = pick_best()\nprint(port(clf))\nAlways remember to run \npip install --upgrade micromlgen\n\nport is a magic method able to port many classifiers: it will automatically detect the proper converter for you.\nWhat does the exported code looks like?\n#pragma once\nnamespace Eloquent {\n namespace ML {\n namespace Port {\n class GaussianNB {\n public:\n /**\n * Predict class for features vector\n */\n int predict(float *x) {\n float votes[3] = { 0.0f };\n float theta[4] = { 0 };\n float sigma[4] = { 0 };\n theta[0] = 0.801139789889; theta[1] = 0.54726920354; theta[2] = 0.234408773313; theta[3] = 0.039178084094;\n sigma[0] = 0.000366881742; sigma[1] = 0.000907992556; sigma[2] = 0.000740960787; sigma[3] = 0.000274925514;\n votes[0] = 0.333333333333 - gauss(x, theta, sigma);\n theta[0] = 0.748563871324; theta[1] = 0.349390892644; theta[2] = 0.536186138345; theta[3] = 0.166747384117;\n sigma[0] = 0.000529727082; sigma[1] = 0.000847956504; sigma[2] = 0.000690057342; sigma[3] = 0.000311828658;\n votes[1] = 0.333333333333 - gauss(x, theta, sigma);\n theta[0] = 0.704497203305; theta[1] = 0.318862439835; theta[2] = 0.593755956917; theta[3] = 0.217288784452;\n sigma[0] = 0.000363782089; sigma[1] = 0.000813846722; sigma[2] = 0.000415475678; sigma[3] = 0.000758478249;\n votes[2] = 0.333333333333 - gauss(x, theta, sigma);\n // return argmax of votes\n uint8_t classIdx = 0;\n float maxVotes = votes[0];\n\n for (uint8_t i = 1; i < 3; i++) {\n if (votes[i] > maxVotes) {\n classIdx = i;\n maxVotes = votes[i];\n }\n }\n\n return classIdx;\n }\n\n protected:\n /**\n * Compute gaussian value\n */\n float gauss(float *x, float *theta, float *sigma) {\n float gauss = 0.0f;\n\n for (uint16_t i = 0; i < 4; i++) {\n gauss += log(sigma[i]);\n gauss += pow(x[i] - theta[i], 2) / sigma[i];\n }\n\n return gauss;\n }\n };\n }\n }\n }\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nAs you can see, we need a couple of "weight vectors":\n\ntheta is the mean of each feature\nsigma is the standard deviation\n\nThe computation is quite thin: just a couple of operations; the class with the highest score is then selected.\nBenchmarks\nFollowing there's a recap of a couple benchmarks I run on an Arduino Nano 33 Ble Sense.\n\n\n\nClassifier\nDataset\nFlash\nRAM\nExecution time\nAccuracy\n\n\n\n\nGaussianNB\nIris (150x4)\n82 kb\n42 Kb\n65 ms\n97%\n\n\nLinearSVC\nIris (150x4)\n83 Kb\n42 Kb\n76 ms\n99%\n\n\nGaussianNB\nBreast cancer (80x40)\n90 Kb\n42 Kb\n160 ms\n77%\n\n\nLinearSVC\nBreast cancer (80x40)\n112 Kb\n42 Kb\n378 ms\n73%\n\n\nGaussianNB\nWine (100x13)\n85 Kb\n42 Kb\n130 ms\n97%\n\n\nLinearSVC\nWine (100x13)\n89 Kb\n42 Kb\n125 ms\n99%\n\n\n\nWe can see that the accuracy is on par with a linear SVM, reaching up to 97% on some datasets. Its semplicity shines with high-dimensional datasets (breast cancer) where execution time is half of the LinearSVC: I can see this pattern repeating with other real-world, medium-sized datasets.\n\nThis is it, you can find the example project on Github.\nL'articolo EloquentML grows its family of classifiers: Gaussian Naive Bayes on Arduino proviene da Eloquent Arduino Blog.", "date_published": "2020-08-02T10:44:36+02:00", "date_modified": "2020-08-02T11:36:42+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "ml", "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1214", "url": "https://eloquentarduino.github.io/2020/07/sefr-a-fast-linear-time-classifier-for-ultra-low-power-devices/", "title": "SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices", "content_html": "A brand new binary classifier that's tiny and accurate, perfect for embedded scenarios: easily achieve 90+ % accuracy with a minimal memory footprint!
\n\n\n
A few weeks ago I was wandering over arxiv.org looking for insipiration relative to Machine learning on microcontrollers when I found exactly what I was looking for.
\nSEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices is a paper from Hamidreza Keshavarz, Mohammad Saniee Abadeh, Reza Rawassizadeh where the authors develop a binary classifier that is:
\nIt has been specifically designed for embedded machine learning, so no optimization is required to run in on microcontrollers: it is tiny by design. In short, it uses a combination of the averages of the features as weights plus a bias to distinguish between positive and negative class. If you read the paper you will sure understand it: it's very straightforward.
\nThe authors both provided a C and Python implementation on Github you can read. I ported the C version "manually" to my Eloquent ML library and created a Python package called sefr copy-pasting from the original repo.
\nHere's a Python example.
\nfrom sefr import SEFR\nfrom sklearn.datasets import load_iris\nfrom sklearn.preprocessing import normalize\nfrom sklearn.model_selection import train_test_split\n\nif __name__ == '__main__':\n iris = load_iris()\n X = normalize(iris.data)\n y = iris.target\n X = X[y < 2]\n y = y[y < 2]\n X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n clf = SEFR()\n clf.fit(X_train, y_train)\n print('accuracy', (clf.predict(X_test) == y_test).sum() / len(y_test))
\nHow good is it?
\nDataset | \nNo. of features | \nAccuracy | \n
---|---|---|
Iris | \n4 | \n100% | \n
Breast cancer | \n30 | \n89% | \n
Wine | \n13 | \n84% | \n
Digits | \n64 | \n99% | \n
Considering that the model only needs 1 weight per feature, I think this results are impressive!
\nThe Python porting was done so I could integrate it easily in my micromlgen package.
\nHow to use it?
\nfrom sefr import SEFR\nfrom sklearn.datasets import load_iris\nfrom micromlgen import port\n\nif __name__ == '__main__':\n iris = load_iris()\n X = iris.data\n y = iris.target\n X = X[y < 2]\n y = y[y < 2]\n clf = SEFR()\n clf.fit(X_train, y_train)\n print(port(clf))
\nThe produced code is so compact I will report it here.
\n\r\n#pragma once\nnamespace Eloquent {\n namespace ML {\n namespace Port {\n class SEFR {\n public:\n /**\n * Predict class for features vector\n */\n int predict(float *x) {\n return dot(x, 0.084993602632 , -0.106163278477 , 0.488989863684 , 0.687022900763 ) <= 2.075 ? 0 : 1;\n }\n\n protected:\n /**\n * Compute dot product between features vector and classifier weights\n */\n float dot(float *x, ...) {\n va_list w;\n va_start(w, 4);\n float kernel = 0.0;\n\n for (uint16_t i = 0; i < 4; i++) {\n kernel += x[i] * va_arg(w, double);\n }\n\n return kernel;\n }\n };\n }\n }\n }
\nIn your sketch:
\n#include "IrisSEFR.h"\n#include "IrisTest.h"\n\nvoid setup() {\n Serial.begin(115200);\n}\n\nvoid loop() {\n Eloquent::ML::Port::SEFR clf;\n Eloquent::ML::Test::IrisTestSet testSet;\n\n testSet.test(clf);\n Serial.println(testSet.dump());\n delay(5000);\n}
\nYou have to clone the Github example to compile the code.
\nThat's all for today, I hope you will try this classifier and find a project it fits in: I'm very impressed by the easiness of implementation yet the accuracy it can achieve on benchmark datasets.
\nIn the next weeks I'm thinking in implementing a multi-class version of this and see how it performs, so stay tuned!
\nL'articolo SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices proviene da Eloquent Arduino Blog.
\n", "content_text": "A brand new binary classifier that's tiny and accurate, perfect for embedded scenarios: easily achieve 90+ % accuracy with a minimal memory footprint!\n\n\nA few weeks ago I was wandering over arxiv.org looking for insipiration relative to Machine learning on microcontrollers when I found exactly what I was looking for.\nSEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices is a paper from Hamidreza Keshavarz, Mohammad Saniee Abadeh, Reza Rawassizadeh where the authors develop a binary classifier that is:\n\nfast during training\nfast during prediction\nrequires minimal memory\n\nIt has been specifically designed for embedded machine learning, so no optimization is required to run in on microcontrollers: it is tiny by design. In short, it uses a combination of the averages of the features as weights plus a bias to distinguish between positive and negative class. If you read the paper you will sure understand it: it's very straightforward.\nHow to use\nThe authors both provided a C and Python implementation on Github you can read. I ported the C version "manually" to my Eloquent ML library and created a Python package called sefr copy-pasting from the original repo.\nHere's a Python example.\nfrom sefr import SEFR\nfrom sklearn.datasets import load_iris\nfrom sklearn.preprocessing import normalize\nfrom sklearn.model_selection import train_test_split\n\nif __name__ == '__main__':\n iris = load_iris()\n X = normalize(iris.data)\n y = iris.target\n X = X[y < 2]\n y = y[y < 2]\n X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n clf = SEFR()\n clf.fit(X_train, y_train)\n print('accuracy', (clf.predict(X_test) == y_test).sum() / len(y_test))\nHow good is it?\n\n\n\nDataset\nNo. of features\nAccuracy\n\n\n\n\nIris\n4\n100%\n\n\nBreast cancer\n30\n89%\n\n\nWine\n13\n84%\n\n\nDigits\n64\n99%\n\n\n\nConsidering that the model only needs 1 weight per feature, I think this results are impressive!\nMicromlgen integration\nThe Python porting was done so I could integrate it easily in my micromlgen package.\nHow to use it?\nfrom sefr import SEFR\nfrom sklearn.datasets import load_iris\nfrom micromlgen import port\n\nif __name__ == '__main__':\n iris = load_iris()\n X = iris.data\n y = iris.target\n X = X[y < 2]\n y = y[y < 2]\n clf = SEFR()\n clf.fit(X_train, y_train)\n print(port(clf))\nThe produced code is so compact I will report it here.\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\n#pragma once\nnamespace Eloquent {\n namespace ML {\n namespace Port {\n class SEFR {\n public:\n /**\n * Predict class for features vector\n */\n int predict(float *x) {\n return dot(x, 0.084993602632 , -0.106163278477 , 0.488989863684 , 0.687022900763 ) <= 2.075 ? 0 : 1;\n }\n\n protected:\n /**\n * Compute dot product between features vector and classifier weights\n */\n float dot(float *x, ...) {\n va_list w;\n va_start(w, 4);\n float kernel = 0.0;\n\n for (uint16_t i = 0; i < 4; i++) {\n kernel += x[i] * va_arg(w, double);\n }\n\n return kernel;\n }\n };\n }\n }\n }\nIn your sketch:\n#include "IrisSEFR.h"\n#include "IrisTest.h"\n\nvoid setup() {\n Serial.begin(115200);\n}\n\nvoid loop() {\n Eloquent::ML::Port::SEFR clf;\n Eloquent::ML::Test::IrisTestSet testSet;\n\n testSet.test(clf);\n Serial.println(testSet.dump());\n delay(5000);\n}\nYou have to clone the Github example to compile the code.\n\nThat's all for today, I hope you will try this classifier and find a project it fits in: I'm very impressed by the easiness of implementation yet the accuracy it can achieve on benchmark datasets.\nIn the next weeks I'm thinking in implementing a multi-class version of this and see how it performs, so stay tuned!\nL'articolo SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices proviene da Eloquent Arduino Blog.", "date_published": "2020-07-10T17:09:58+02:00", "date_modified": "2020-07-12T17:04:14+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1174", "url": "https://eloquentarduino.github.io/2020/06/arduino-dimensionality-reduction-pca-for-machine-learning-projects/", "title": "Arduino dimensionality reduction (PCA) for Machine Learning projects", "content_html": "When working with Machine Learning projects on microcontrollers and embedded devices the dimension of features can become a limiting factor due to the lack of RAM: dimensionality reduction (eg. PCA) will help you shrink your models and even achieve higher prediction accuracy.
\n\n\n
Dimensionality reduction is a tecnique you see often in Machine Learning projects. By stripping away "unimportant" or redundant information, it generally helps speeding up the training process and achieving higher classification performances.
\nSince we now know we can run Machine Learning on Arduino boards and embedded microcontrollers, it can become a key tool at our disposal to squeeze out the most out of our boards.
\nIn the specific case of resource-constrained devices as old Arduino boards (the UNO for example, with only 2 kb of RAM), it can become a decisive turn in unlocking even more application scenarios where the high dimensionality of the input features would not allow any model to fit.
\nLet's take the Gesture classification project as an example: among the different classifiers we trained, only one fitted on the Arduino UNO, since most of them required too much flash memory due to the high dimension of features (90) and support vectors (25 to 61).
\nIn this post I will resume that example and see if dimensionality reduction can help reduce this gap.
\nIf you are working on a project with many features, let me know in the comments so I can create a detailed list of real world examples.
\nAmong the many algorithms available for dimensionality reduction, I decided to start with PCA (Principal Component Analysis) because it's one of the most widespread. In the next weeks I will probably work on porting other alternatives.
\nIf you never used my Python package micromlgen I first invite you to read the introduction post to get familiar with it.
\nAlways remember to install the latest version, since I publish frequent updates.
\npip install --upgrade micromlgen
\nNow it is pretty straight-forward to convert a sklearn PCA transformer to plain C: you use the magic method port
. In addition to converting SVM/RVM classifiers, it is now able to export PCA too.
from sklearn.decomposition import PCA\nfrom sklearn.datasets import load_iris\nfrom micromlgen import port\n\nif __name__ == '__main__':\n X = load_iris().data\n pca = PCA(n_components=2, whiten=False).fit(X)\n\n print(port(pca))
\nTo use the exported code, we first have to include it in our sketch. Save the contents to a file (I named it pca.h
) in the same folder of your .ino
project and include it.
#include "pca.h"\n\n// this was trained on the IRIS dataset, with 2 principal components\nEloquent::ML::Port::PCA pca;
\nThe pca
object is now able to take an array of size N as input and return an array of size K as output, with K < N usually.
void setup() {\n float x_input[4] = {5.1, 3.5, 1.4, 0.2};\n float x_output[2];\n\n pca.transform(x_input, x_output);\n}
\nThat's it: now you can run your classifier on x_output
.
#include "pca.h"\n#include "svm.h"\n\nEloquent::ML::Port::PCA pca;\nEloquent::ML::Port::SVM clf;\n\nvoid setup() {\n float x_input[4] = {5.1, 3.5, 1.4, 0.2};\n float x_output[2];\n int y_pred;\n\n pca.transform(x_input, x_output);\n\n y_pred = clf.predict(x_output);\n}
\n\r\nAs I anticipated, let's take a look at how PCA dimensionality reduction can help in fitting classifiers that would otherwise be too large to fit on our microcontrollers.
\nThis is the exact table from the Gesture classification project.
\nKernel | \nC | \nGamma | \nDegree | \nVectors | \nFlash size | \nRAM (b) | \nAvg accuracy | \n
---|---|---|---|---|---|---|---|
RBF | \n10 | \n0.001 | \n- | \n37 | \n53 Kb | \n1228 | \n99% | \n
Poly | \n100 | \n0.001 | \n2 | \n12 | \n25 Kb | \n1228 | \n99% | \n
Poly | \n100 | \n0.001 | \n3 | \n25 | \n40 Kb | \n1228 | \n97% | \n
Linear | \n50 | \n- | \n1 | \n40 | \n55 Kb | \n1228 | \n95% | \n
RBF | \n100 | \n0.01 | \n- | \n61 | \n80 Kb | \n1228 | \n95% | \n
The dataset has 90 features (30 samples x 3 axes) and achieves 99% accuracy.
\nLet's pick the poly
kernel with degree 2
and see how much we can decrease the number of components while still achieving a good accuracy.
PCA components | \nAccuracy | \nSupport vectors | \n
---|---|---|
90 | \n99% | \n31 | \n
50 | \n99% | \n31 | \n
40 | \n99% | \n31 | \n
30 | \n90% | \n30 | \n
20 | \n90% | \n28 | \n
15 | \n90% | \n24 | \n
10 | \n99% | \n18 | \n
5 | \n76% | \n28 | \n
We clearly see a couple of things:
\nWhat do these numbers mean to you? It means your board has to do many less computations to give you a prediction and will probably be able to host a more complex model.
\nLet's check out the figures with n_components = 10
compared with the ones without PCA.
Kernel | \nPCA support vectors | \nPCA flash size | \nAccuracy | \n
---|---|---|---|
RBF C=10 | \n46 (+24%) | \n32 Kb (-40%) | \n99% | \n
RBF C=100 | \n28 (-54%) | \n32 Kb (-60%) | \n99% | \n
Poly 2 | \n13 (-48%) | \n28 Kb (+12%) | \n99% | \n
Poly 3 | \n24 (-4%) | \n32 Kb (-20%) | \n99% | \n
Linear | \n18 (-64%) | \n29 Kb (-47%) | \n99% | \n
A couple notes:
\nI will probably spend some more time investingating the usefulness of PCA for Arduino Machine Learning projects, but for now that's it: it's a good starting point in my opinion.
\nThere's a little example sketch on Github that applies PCA to the IRIS dataset.
\nTell me what you think may be a clever application of dimensionality reduction in the world of microcontrollers and see if we can build something great together.
\nL'articolo Arduino dimensionality reduction (PCA) for Machine Learning projects proviene da Eloquent Arduino Blog.
\n", "content_text": "When working with Machine Learning projects on microcontrollers and embedded devices the dimension of features can become a limiting factor due to the lack of RAM: dimensionality reduction (eg. PCA) will help you shrink your models and even achieve higher prediction accuracy.\n\n\nWhy dimensionality reduction on Arduino microcontrollers?\nDimensionality reduction is a tecnique you see often in Machine Learning projects. By stripping away "unimportant" or redundant information, it generally helps speeding up the training process and achieving higher classification performances.\nSince we now know we can run Machine Learning on Arduino boards and embedded microcontrollers, it can become a key tool at our disposal to squeeze out the most out of our boards.\nIn the specific case of resource-constrained devices as old Arduino boards (the UNO for example, with only 2 kb of RAM), it can become a decisive turn in unlocking even more application scenarios where the high dimensionality of the input features would not allow any model to fit.\nLet's take the Gesture classification project as an example: among the different classifiers we trained, only one fitted on the Arduino UNO, since most of them required too much flash memory due to the high dimension of features (90) and support vectors (25 to 61).\nIn this post I will resume that example and see if dimensionality reduction can help reduce this gap.\nIf you are working on a project with many features, let me know in the comments so I can create a detailed list of real world examples.\nHow to export PCA (Principal Component Analysis) to plain C\nAmong the many algorithms available for dimensionality reduction, I decided to start with PCA (Principal Component Analysis) because it's one of the most widespread. In the next weeks I will probably work on porting other alternatives.\nIf you never used my Python package micromlgen I first invite you to read the introduction post to get familiar with it.\nAlways remember to install the latest version, since I publish frequent updates.\npip install --upgrade micromlgen\nNow it is pretty straight-forward to convert a sklearn PCA transformer to plain C: you use the magic method port. In addition to converting SVM/RVM classifiers, it is now able to export PCA too.\nfrom sklearn.decomposition import PCA\nfrom sklearn.datasets import load_iris\nfrom micromlgen import port\n\nif __name__ == '__main__':\n X = load_iris().data\n pca = PCA(n_components=2, whiten=False).fit(X)\n\n print(port(pca))\nHow to deploy PCA to Arduino\nTo use the exported code, we first have to include it in our sketch. Save the contents to a file (I named it pca.h) in the same folder of your .ino project and include it.\n#include "pca.h"\n\n// this was trained on the IRIS dataset, with 2 principal components\nEloquent::ML::Port::PCA pca;\nThe pca object is now able to take an array of size N as input and return an array of size K as output, with K < N usually.\nvoid setup() {\n float x_input[4] = {5.1, 3.5, 1.4, 0.2};\n float x_output[2];\n\n pca.transform(x_input, x_output);\n}\nThat's it: now you can run your classifier on x_output.\n#include "pca.h"\n#include "svm.h"\n\nEloquent::ML::Port::PCA pca;\nEloquent::ML::Port::SVM clf;\n\nvoid setup() {\n float x_input[4] = {5.1, 3.5, 1.4, 0.2};\n float x_output[2];\n int y_pred;\n\n pca.transform(x_input, x_output);\n\n y_pred = clf.predict(x_output);\n}\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nA real world example\nAs I anticipated, let's take a look at how PCA dimensionality reduction can help in fitting classifiers that would otherwise be too large to fit on our microcontrollers.\nThis is the exact table from the Gesture classification project.\n\n\n\nKernel\nC\nGamma\nDegree\nVectors\nFlash size\nRAM (b)\nAvg accuracy\n\n\n\n\nRBF\n10\n0.001\n-\n37\n53 Kb\n1228\n99%\n\n\nPoly\n100\n0.001\n2\n12\n25 Kb\n1228\n99%\n\n\nPoly\n100\n0.001\n3\n25\n40 Kb\n1228\n97%\n\n\nLinear\n50\n-\n1\n40\n55 Kb\n1228\n95%\n\n\nRBF\n100\n0.01\n-\n61\n80 Kb\n1228\n95%\n\n\n\nThe dataset has 90 features (30 samples x 3 axes) and achieves 99% accuracy. \nLet's pick the poly kernel with degree 2 and see how much we can decrease the number of components while still achieving a good accuracy.\n\n\n\nPCA components\nAccuracy\nSupport vectors\n\n\n\n\n90\n99%\n31\n\n\n50\n99%\n31\n\n\n40\n99%\n31\n\n\n30\n90%\n30\n\n\n20\n90%\n28\n\n\n15\n90%\n24\n\n\n10\n99%\n18\n\n\n5\n76%\n28\n\n\n\nWe clearly see a couple of things:\n\nwe still achieve 99% accuracy even with only 40 out of 90 principal components\nwe get a satisfactory 90% accuracy even with only 15 components\n(this is a bit unexpected) it looks like there's a sweet spot at 10 components where the accuracy skyrockets to 99% again. This could be just a contingency of this particular dataset, don't expect to replicate this results on your own dataset\n\nWhat do these numbers mean to you? It means your board has to do many less computations to give you a prediction and will probably be able to host a more complex model.\nLet's check out the figures with n_components = 10 compared with the ones without PCA.\n\n\n\nKernel\nPCA support vectors\nPCA flash size\nAccuracy\n\n\n\n\nRBF C=10\n46 (+24%)\n32 Kb (-40%)\n99%\n\n\nRBF C=100\n28 (-54%)\n32 Kb (-60%)\n99%\n\n\nPoly 2\n13 (-48%)\n28 Kb (+12%)\n99%\n\n\nPoly 3\n24 (-4%)\n32 Kb (-20%)\n99%\n\n\nLinear\n18 (-64%)\n29 Kb (-47%)\n99%\n\n\n\nA couple notes:\n\naccuracy increased (on stayed the same) for all kernels\nwith one exception, flash size decreased in the range 20 - 50%\nnow we can fit 3 classifiers on our Arduino UNO instead of only one\n\nI will probably spend some more time investingating the usefulness of PCA for Arduino Machine Learning projects, but for now that's it: it's a good starting point in my opinion.\n\nThere's a little example sketch on Github that applies PCA to the IRIS dataset.\nTell me what you think may be a clever application of dimensionality reduction in the world of microcontrollers and see if we can build something great together.\nL'articolo Arduino dimensionality reduction (PCA) for Machine Learning projects proviene da Eloquent Arduino Blog.", "date_published": "2020-06-07T09:24:20+02:00", "date_modified": "2020-06-07T11:26:25+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "pca", "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1156", "url": "https://eloquentarduino.github.io/2020/05/anomaly-detection-on-your-arduino-microcontroller-via-one-class-svm/", "title": "Anomaly detection on your Arduino microcontroller via One Class SVM", "content_html": "Support Vector Machines are very often used for classification tasks: but you may not know that they're so flexible they can be used for anomaly detection and novelty detection. Thanks to the micromlgen package, you can run One Class SVM on your Arduino microcontorller.
\n
\n
As the name implies, anomaly detection can be used to monitor a stream of data and alert you when something unexpected happens.
\nThink of an Industrial IoT setup where you have a bunch of sensors monitoring the working state of a production plant: you want to know as soon as possible if something bad is gonna happen.
In this case, anomaly detection tells you if your machinery is acting in a different way from the normal state so you can take action.
\n(This application was suggested from two of my readers)
\nSay you're developing a super simple word classification project: you want to distinguish door bell from fire alarm (as per one of the two readers).
\nSo you train your SVM classifier and use micromlgen to run it on your Arduino microcontroller.
It works well, but we have a problem: you live in a noisy environment with many sounds, so not all of them will either be door bells or fire alarms. Since your classifier is binary, it has to classify all of the sounds as either A or B.
\nThe solution will be novelty detection: before running the binary SVM, you run the OneClassSVM to filter known sounds (bell and alarm) from unknown ones (eg. dog barking).
\nIf OneClassSVM predicts the sound as a novelty, you discard it since it's of no interest for you. If it predicts the sound as known, you run the binary SVM.
Porting a OneClassSVM from Python to plain C++ is as easy as a single command in the micromlgen package:
\nfrom sklearn.svm import OneClassSVM\nfrom micromlgen import port\n\nclf = OneClassSVM(kernel="rbf", nu=0.5, gamma=0.1)\nclf.fit(X, y)\nprint(port(clf))
\npip install --upgrade micromlgen
\nIf you read my previous posts about micromlgen and SVM the above snippet should be familiar: with the latest release, port
is able to export either SVC, LinearSVC, OneClassSVC and RVC (Relevant Vector Machines) to object oriented C++.
Now you can embed the generated code in your Arduino sketch.
\n#include "OneClassSVM.h"\n\nEloquent::ML::Port::OneClassSVM clf;\n\nvoid setup() {\n Serial.begin(115200);\n delay(2000);\n\n for (int i = 0; i < DATASET_SIZE; i++)\n clf.predict(X[i]);\n}\n\nvoid loop() {}
\nI created an example sketch from a synthetic dataset for anomaly detection (copied from a scikit-learn example) you can run to get a feel of how it performs.
\nGo checkout the Github repo
\nL'articolo Anomaly detection on your Arduino microcontroller via One Class SVM proviene da Eloquent Arduino Blog.
\n", "content_text": "Support Vector Machines are very often used for classification tasks: but you may not know that they're so flexible they can be used for anomaly detection and novelty detection. Thanks to the micromlgen package, you can run One Class SVM on your Arduino microcontorller.\n\n\nWhat is anomaly / novelty detection useful for?\nDetect noise or anomalies\nAs the name implies, anomaly detection can be used to monitor a stream of data and alert you when something unexpected happens.\nThink of an Industrial IoT setup where you have a bunch of sensors monitoring the working state of a production plant: you want to know as soon as possible if something bad is gonna happen.\nIn this case, anomaly detection tells you if your machinery is acting in a different way from the normal state so you can take action.\nIgnore irrelevant data\n(This application was suggested from two of my readers)\nSay you're developing a super simple word classification project: you want to distinguish door bell from fire alarm (as per one of the two readers).\nSo you train your SVM classifier and use micromlgen to run it on your Arduino microcontroller.\nIt works well, but we have a problem: you live in a noisy environment with many sounds, so not all of them will either be door bells or fire alarms. Since your classifier is binary, it has to classify all of the sounds as either A or B.\nThe solution will be novelty detection: before running the binary SVM, you run the OneClassSVM to filter known sounds (bell and alarm) from unknown ones (eg. dog barking).\nIf OneClassSVM predicts the sound as a novelty, you discard it since it's of no interest for you. If it predicts the sound as known, you run the binary SVM.\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nHow to run anomaly / novelty detection on Arduino microcontroller via OneClassSVM\nPorting a OneClassSVM from Python to plain C++ is as easy as a single command in the micromlgen package:\nfrom sklearn.svm import OneClassSVM\nfrom micromlgen import port\n\nclf = OneClassSVM(kernel="rbf", nu=0.5, gamma=0.1)\nclf.fit(X, y)\nprint(port(clf))\n\nYou will need micromlgen version 1.0.2 to port OneClassSVM. If you have an outdated version, please run pip install --upgrade micromlgen\n\nIf you read my previous posts about micromlgen and SVM the above snippet should be familiar: with the latest release, port is able to export either SVC, LinearSVC, OneClassSVC and RVC (Relevant Vector Machines) to object oriented C++.\nNow you can embed the generated code in your Arduino sketch.\n#include "OneClassSVM.h"\n\nEloquent::ML::Port::OneClassSVM clf;\n\nvoid setup() {\n Serial.begin(115200);\n delay(2000);\n\n for (int i = 0; i < DATASET_SIZE; i++)\n clf.predict(X[i]);\n}\n\nvoid loop() {}\n\nI created an example sketch from a synthetic dataset for anomaly detection (copied from a scikit-learn example) you can run to get a feel of how it performs.\nGo checkout the Github repo\nL'articolo Anomaly detection on your Arduino microcontroller via One Class SVM proviene da Eloquent Arduino Blog.", "date_published": "2020-05-31T18:44:36+02:00", "date_modified": "2020-05-31T19:44:50+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1079", "url": "https://eloquentarduino.github.io/2020/04/incremental-multiclass-classification-on-microcontrollers-one-vs-one/", "title": "Incremental multiclass classification on microcontrollers: One vs One", "content_html": "In earlier posts I showed you can run incremental binary classification on your microcontroller with Stochastic Gradient Descent or Passive-Aggressive classifier. Now it is time to upgrade your toolbelt with a new item: One-vs-One multiclass classifier.
\n\n
Many classifiers are, by nature, binary: they can only distinguish the positive class from the negative one. Many of real-world problems, however, are multiclass: you have 3 or more possible outcomes to distinguish from.
\nThere are a couple of ways to achieve this:
\nSince SGD and Passive-Aggressive don't output a confidence score, I implemented the One vs One algorithm to tackle the multiclass classification problem on microcontrollers.
\nActually, One vs One is not a new type of classifier: it is really a "coordinator" class that sorts which samples go to which classifier. You can still choose your own classifier type to use.
\nAs SGD and Passive-Aggressive, OneVsOne implements the classifier interface, so you will use the well known fitOne
and predict
methods.
// Esp32 has some problems with min/max\n#define min(a, b) (a) < (b) ? (a) : (b)\n#define max(a, b) (a) > (b) ? (a) : (b)\n// you will actually need only one of SGD or PassiveAggressive\n#include "EloquentSGD.h"\n#include "EloquentPassiveAggressive.h"\n#include "EloquentOneVsOne.h"\n#include "EloquentAccuracyScorer.h"\n// this file defines NUM_FEATURES, NUM_CLASSES, TRAIN_SAMPLES and TEST_SAMPLES\n#include "dataset.h"\n\nusing namespace Eloquent::ML;\n\nvoid setup() {\n Serial.begin(115200);\n delay(3000);\n}\n\nvoid loop() {\n AccuracyScorer scorer;\n // OneVsOne needs the actual classifier class, the number of features and the number of classes\n OneVsOne<SGD<FEATURES_DIM>, FEATURES_DIM, NUM_CLASSES> clf;\n\n // clf.set() propagates the configuration to the actual classifiers\n // if a parameter does not exists on the classifier, it does nothing\n // in this example, alpha and momentum refer to SGD, C to Passive-Aggressive\n clf.set("alpha", 1);\n clf.set("momentum", 0.7);\n clf.set("C", 0.1);\n\n // fit\n // I noticed that repeating the training a few times over the same dataset increases performance to a certain extent: if you re-train it too much, performance will decay\n for (unsigned int i = 0; i < TRAIN_SAMPLES * 5; i++) {\n clf.fitOne(X_train[i % TRAIN_SAMPLES], y_train[i % TRAIN_SAMPLES]);\n }\n\n // predict\n for (int i = 0; i < TEST_SAMPLES; i++) {\n int y_true = y_test[i];\n int y_pred = clf.predict(X_test[i]);\n\n Serial.print("Predicted ");\n Serial.print(y_pred);\n Serial.print(" vs ");\n Serial.println(y_true);\n scorer.scoreOne(y_true, y_pred);\n }\n\n Serial.print("Accuracy = ");\n Serial.print(scorer.accuracy() * 100);\n Serial.print(" out of ");\n Serial.print(scorer.support());\n Serial.println(" samples");\n delay(30000);\n}
\nIf you refer to the previous posts on SGD and Passive-Aggressive, you'll notice that you would be able to replace one with the other and your code will change by 1 single line only. This let's you experiment to find the best configuration for your project without hassle.
\nWell, accuracy vary.
\nIn my tests, I couldn't get predictable accuracy on all datasets. I couldn't even get acceptable accuracy on the Iris dataset (60% max). But I got 90% accuracy on the Digits dataset from scikit-learn with 6 classes.
\nYou have to experiment. Try Passive-Aggressive with many C
values. If it doesn't work, try SGD with varying momentum
and alpha
. Try to repeat the training over the dataset 5, 10 times.
In a next post I'll report my benchmarks so you can see what works for you and what not.
\nThis is an emerging field for me, so I will need time to master it.
As always, you can find the examle on Github with a the dataset to experiment with.
\nL'articolo Incremental multiclass classification on microcontrollers: One vs One proviene da Eloquent Arduino Blog.
\n", "content_text": "In earlier posts I showed you can run incremental binary classification on your microcontroller with Stochastic Gradient Descent or Passive-Aggressive classifier. Now it is time to upgrade your toolbelt with a new item: One-vs-One multiclass classifier.\n\nOne vs One\nMany classifiers are, by nature, binary: they can only distinguish the positive class from the negative one. Many of real-world problems, however, are multiclass: you have 3 or more possible outcomes to distinguish from.\nThere are a couple of ways to achieve this:\n\nOne vs All: if your classifier is able to output a confidence score of its prediction, for N classes you train N classifiers, each able to recognize a single class. During inference, you pick the "most confident" one.\nOne vs One: for N classes, you train N * (N-1) / 2 classifiers, one for each couple of classes. During inference, each classifier makes a prediction and you pick the class with the highest number of votes.\n\nSince SGD and Passive-Aggressive don't output a confidence score, I implemented the One vs One algorithm to tackle the multiclass classification problem on microcontrollers.\nActually, One vs One is not a new type of classifier: it is really a "coordinator" class that sorts which samples go to which classifier. You can still choose your own classifier type to use.\nAs SGD and Passive-Aggressive, OneVsOne implements the classifier interface, so you will use the well known fitOne and predict methods.\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nExample code\n// Esp32 has some problems with min/max\n#define min(a, b) (a) < (b) ? (a) : (b)\n#define max(a, b) (a) > (b) ? (a) : (b)\n// you will actually need only one of SGD or PassiveAggressive\n#include "EloquentSGD.h"\n#include "EloquentPassiveAggressive.h"\n#include "EloquentOneVsOne.h"\n#include "EloquentAccuracyScorer.h"\n// this file defines NUM_FEATURES, NUM_CLASSES, TRAIN_SAMPLES and TEST_SAMPLES\n#include "dataset.h"\n\nusing namespace Eloquent::ML;\n\nvoid setup() {\n Serial.begin(115200);\n delay(3000);\n}\n\nvoid loop() {\n AccuracyScorer scorer;\n // OneVsOne needs the actual classifier class, the number of features and the number of classes\n OneVsOne<SGD<FEATURES_DIM>, FEATURES_DIM, NUM_CLASSES> clf;\n\n // clf.set() propagates the configuration to the actual classifiers\n // if a parameter does not exists on the classifier, it does nothing\n // in this example, alpha and momentum refer to SGD, C to Passive-Aggressive\n clf.set("alpha", 1);\n clf.set("momentum", 0.7);\n clf.set("C", 0.1);\n\n // fit\n // I noticed that repeating the training a few times over the same dataset increases performance to a certain extent: if you re-train it too much, performance will decay\n for (unsigned int i = 0; i < TRAIN_SAMPLES * 5; i++) {\n clf.fitOne(X_train[i % TRAIN_SAMPLES], y_train[i % TRAIN_SAMPLES]);\n }\n\n // predict\n for (int i = 0; i < TEST_SAMPLES; i++) {\n int y_true = y_test[i];\n int y_pred = clf.predict(X_test[i]);\n\n Serial.print("Predicted ");\n Serial.print(y_pred);\n Serial.print(" vs ");\n Serial.println(y_true);\n scorer.scoreOne(y_true, y_pred);\n }\n\n Serial.print("Accuracy = ");\n Serial.print(scorer.accuracy() * 100);\n Serial.print(" out of ");\n Serial.print(scorer.support());\n Serial.println(" samples");\n delay(30000);\n}\nIf you refer to the previous posts on SGD and Passive-Aggressive, you'll notice that you would be able to replace one with the other and your code will change by 1 single line only. This let's you experiment to find the best configuration for your project without hassle.\nAccuracy\nWell, accuracy vary.\nIn my tests, I couldn't get predictable accuracy on all datasets. I couldn't even get acceptable accuracy on the Iris dataset (60% max). But I got 90% accuracy on the Digits dataset from scikit-learn with 6 classes.\nYou have to experiment. Try Passive-Aggressive with many C values. If it doesn't work, try SGD with varying momentum and alpha. Try to repeat the training over the dataset 5, 10 times.\nIn a next post I'll report my benchmarks so you can see what works for you and what not.\nThis is an emerging field for me, so I will need time to master it.\n\nAs always, you can find the examle on Github with a the dataset to experiment with.\nL'articolo Incremental multiclass classification on microcontrollers: One vs One proviene da Eloquent Arduino Blog.", "date_published": "2020-04-26T10:01:14+02:00", "date_modified": "2020-04-26T11:52:29+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "incremental-learning", "microml", "ml", "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1062", "url": "https://eloquentarduino.github.io/2020/04/stochastic-gradient-descent-on-your-microcontroller/", "title": "Stochastic Gradient Descent on your microcontroller", "content_html": "Stochastic gradient descent is a well know algorithm to train classifiers in an incremental fashion: that is, as training samples become available. This saves you critical memory on tiny devices while still achieving top performance! Now you can use it on your microcontroller with ease.
\n\n
If you ever worked with Machine learning, you surely know about Gradient descent: it is an iterative algorithm to optimize a loss function.
\nIt is much general-purpose, in the sense that it is not bound to a particular application, but it has been heavily used in Neural networks in the recent years.
\nYet, it can be used as a classifier on its own if you set its loss function as the classification error.
\n\nThis is the core update rule of Gradient descent: quite simple.
\nAs you see, there's a summation in the formula: this means we need to cycle through the entire training set to compute the update to the weights.
\nIn case of large datasets, this can be slow or not possible at all.
\nAnd requires a lot of memory.
\nAnd we don't have memory on microcontrollers.
\nSo we need Stochastic gradient descent.
\nStochastic gradient descent has the same exact update rule, but it is applied on the single training sample.
\nImagine the summation goes from 1 to 1, instead of m.
\nThat's it.
\nThe pattern of use is similar to that of the Passive Aggressive classifier: you have the fitOne
and predict
methods.
First of all, download the library from Github.
\n#include <EloquentSGD.h>\n#include <EloquentAccuracyScorer.h>\n#include "iris.h"\n\n#define VERBOSE\n\nusing namespace Eloquent::ML;\n\nvoid setup() {\n Serial.begin(115200);\n delay(3000);\n}\n\nvoid loop() {\n int trainSamples;\n int retrainingCycles;\n SGD<FEATURES_DIM> clf;\n AccuracyScorer scorer;\n\n // ....\n\n // train\n for (uint16_t cycle = 0; cycle < retrainingCycles; cycle++)\n for (uint16_t i = 0; i < trainSamples; i++)\n clf.fitOne(X[i], y[i]);\n\n // predict\n for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {\n int predicted = clf.predict(X[i]);\n int actual = y[i];\n\n scorer.scoreOne(actual, predicted);\n }\n\n Serial.print("Accuracy: ");\n Serial.print(round(100 * scorer.accuracy()));\n Serial.print("% out of ");\n Serial.print(scorer.support());\n Serial.println(" predictions");\n}
\nIn this case we're working with known datasets, so we cycle through them for the training, but if you're learning "on-line", from samples generated over time, it will work exactly the same.
\nStochastic gradient descent works quite well out of the box in most cases.
\nSometimes, however, its updates can start "oscillating".
\n\nTo solve this problem, it has been proposed the momentum technique, which can both speed up learning and increase the accuracy.
\nIn my personal tests, I was able to achieve up to +5% in accuracy on the majority of datasets.
\nTo use it, you only need to set a decay factor between 0 and 1.
\nSGD clf;\n\nclf.momentum(0.5);
\nOn Github you can find the full example with some benchmark datasets to try on your own.
\n\r\nThe example is interactive an will ask you how many samples to use for the training and how many times to cycle through them.
\nThis is something you should consider: if you have a training set and can store it somehow (in memory or on Flash for example), re-presenting the same samples to the SGD classifier could (and probably will) increase its performance if done correctly.
\nThis happens because the algorithm needs some time to converge and if it doesn't receive enough samples it won't learn properly.
\nOf course, if you re-use the same samples over and over again, you're likely to overfit.
\nL'articolo Stochastic Gradient Descent on your microcontroller proviene da Eloquent Arduino Blog.
\n", "content_text": "Stochastic gradient descent is a well know algorithm to train classifiers in an incremental fashion: that is, as training samples become available. This saves you critical memory on tiny devices while still achieving top performance! Now you can use it on your microcontroller with ease.\n\nA brief recap on Stochastic Gradient Descent\nIf you ever worked with Machine learning, you surely know about Gradient descent: it is an iterative algorithm to optimize a loss function. \nIt is much general-purpose, in the sense that it is not bound to a particular application, but it has been heavily used in Neural networks in the recent years.\nYet, it can be used as a classifier on its own if you set its loss function as the classification error.\n\nThis is the core update rule of Gradient descent: quite simple.\nAs you see, there's a summation in the formula: this means we need to cycle through the entire training set to compute the update to the weights.\nIn case of large datasets, this can be slow or not possible at all.\nAnd requires a lot of memory.\nAnd we don't have memory on microcontrollers.\nSo we need Stochastic gradient descent.\nStochastic gradient descent has the same exact update rule, but it is applied on the single training sample.\nImagine the summation goes from 1 to 1, instead of m.\nThat's it.\n\nHow to use\nThe pattern of use is similar to that of the Passive Aggressive classifier: you have the fitOne and predict methods.\nFirst of all, download the library from Github.\n#include <EloquentSGD.h>\n#include <EloquentAccuracyScorer.h>\n#include "iris.h"\n\n#define VERBOSE\n\nusing namespace Eloquent::ML;\n\nvoid setup() {\n Serial.begin(115200);\n delay(3000);\n}\n\nvoid loop() {\n int trainSamples;\n int retrainingCycles;\n SGD<FEATURES_DIM> clf;\n AccuracyScorer scorer;\n\n // ....\n\n // train\n for (uint16_t cycle = 0; cycle < retrainingCycles; cycle++)\n for (uint16_t i = 0; i < trainSamples; i++)\n clf.fitOne(X[i], y[i]);\n\n // predict\n for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {\n int predicted = clf.predict(X[i]);\n int actual = y[i];\n\n scorer.scoreOne(actual, predicted);\n }\n\n Serial.print("Accuracy: ");\n Serial.print(round(100 * scorer.accuracy()));\n Serial.print("% out of ");\n Serial.print(scorer.support());\n Serial.println(" predictions");\n}\nIn this case we're working with known datasets, so we cycle through them for the training, but if you're learning "on-line", from samples generated over time, it will work exactly the same.\nA bit of momentum\nStochastic gradient descent works quite well out of the box in most cases.\nSometimes, however, its updates can start "oscillating".\n\nTo solve this problem, it has been proposed the momentum technique, which can both speed up learning and increase the accuracy.\nIn my personal tests, I was able to achieve up to +5% in accuracy on the majority of datasets.\nTo use it, you only need to set a decay factor between 0 and 1.\nSGD clf;\n\nclf.momentum(0.5);\nRun on your own\nOn Github you can find the full example with some benchmark datasets to try on your own.\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nThe example is interactive an will ask you how many samples to use for the training and how many times to cycle through them.\nThis is something you should consider: if you have a training set and can store it somehow (in memory or on Flash for example), re-presenting the same samples to the SGD classifier could (and probably will) increase its performance if done correctly.\nThis happens because the algorithm needs some time to converge and if it doesn't receive enough samples it won't learn properly.\nOf course, if you re-use the same samples over and over again, you're likely to overfit.\nL'articolo Stochastic Gradient Descent on your microcontroller proviene da Eloquent Arduino Blog.", "date_published": "2020-04-10T19:43:45+02:00", "date_modified": "2020-04-12T19:31:52+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "online-learning", "Arduino Machine learning" ] } ] }