Computer vision – Eloquent Arduino Blog http://eloquentarduino.github.io/ Machine learning on Arduino, programming & electronics Sun, 06 Dec 2020 08:31:20 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.6 Esp32-cam motion detection WITH PHOTO CAPTURE! (grayscale version) https://eloquentarduino.github.io/2020/12/esp32-cam-motion-detection-with-photo-capture-grayscale-version/ Thu, 03 Dec 2020 17:50:59 +0000 https://eloquentarduino.github.io/?p=1390 Do you want to transform your cheap esp32-cam in a DIY surveillance camera with moton detection AND photo capture? Look no further: this post explains STEP-BY-STEP all you need to know to build one yourself! As I told you in the Easier, faster pure video Esp32-cam motion detection post, motion detection on the esp32-cam seems […]

L'articolo Esp32-cam motion detection WITH PHOTO CAPTURE! (grayscale version) proviene da Eloquent Arduino Blog.

]]>
Do you want to transform your cheap esp32-cam in a DIY surveillance camera with moton detection AND photo capture?

Look no further: this post explains STEP-BY-STEP all you need to know to build one yourself!

Esp32-cam motion detection

As I told you in the Easier, faster pure video Esp32-cam motion detection post, motion detection on the esp32-cam seems to be the hottest topic on my blog, so I thought it deserved some more tutorials.

Without question, to #1 request you made me in the comments was

How can I save the image that triggered the motion detection to the disk?

Well, in this post I will show you how to save the image to the SPIFFS filesystem your esp32-cam comes equipped with!

Motion detection, refactored

Please read the post on easier, faster esp32-cam motion detection first if you want to understand the following code.

It took me quite some time to write this post because I was struggling to design a clear, easy to use API for the motion detection feature and the image storage.

And I have to admit that, even after so long, I'm still not satisfied with the results.

Nonetheless, it works, and it works well in my opinion, so I will publish this and maybe get feedback from you to help me improve (so please leave a comment if you have any suggestion).

I won't bother you with the design considerations I took since this is an hands-on tutorial, so let's take a look at the code to implement motion detection on the esp32-cam or any other esp32 with a camera attached (I'm using the M5Stick camera).

First of all, you need the EloquentVision library: you can install it either from Github or using the Arduino IDE's Library Manager.

Next, the code.

// Change according to your model
// The models available are
//   - CAMERA_MODEL_WROVER_KIT
//   - CAMERA_MODEL_ESP_EYE
//   - CAMERA_MODEL_M5STACK_PSRAM
//   - CAMERA_MODEL_M5STACK_WIDE
//   - CAMERA_MODEL_AI_THINKER
#define CAMERA_MODEL_M5STACK_WIDE

#include <FS.h>
#include <SPIFFS.h>
#include "EloquentVision.h"

// set the resolution of the source image and the resolution of the downscaled image for the motion detection
#define FRAME_SIZE FRAMESIZE_QVGA
#define SOURCE_WIDTH 320
#define SOURCE_HEIGHT 240
#define CHANNELS 1
#define DEST_WIDTH 32
#define DEST_HEIGHT 24
#define BLOCK_VARIATION_THRESHOLD 0.3
#define MOTION_THRESHOLD 0.2

// we're using the Eloquent::Vision namespace a lot!
using namespace Eloquent::Vision;
using namespace Eloquent::Vision::IO;
using namespace Eloquent::Vision::ImageProcessing;
using namespace Eloquent::Vision::ImageProcessing::Downscale;
using namespace Eloquent::Vision::ImageProcessing::DownscaleStrategies;

// an easy interface to capture images from the camera
ESP32Camera camera;
// the buffer to store the downscaled version of the image
uint8_t resized[DEST_HEIGHT][DEST_WIDTH];
// the downscaler algorithm
// for more details see https://eloquentarduino.github.io/2020/05/easier-faster-pure-video-esp32-cam-motion-detection
Cross<SOURCE_WIDTH, SOURCE_HEIGHT, DEST_WIDTH, DEST_HEIGHT> crossStrategy;
// the downscaler container
Downscaler<SOURCE_WIDTH, SOURCE_HEIGHT, CHANNELS, DEST_WIDTH, DEST_HEIGHT> downscaler(&crossStrategy);
// the motion detection algorithm
MotionDetection<DEST_WIDTH, DEST_HEIGHT> motion;

void setup() {
    Serial.begin(115200);
    SPIFFS.begin(true);
    camera.begin(FRAME_SIZE, PIXFORMAT_GRAYSCALE);
    motion.setBlockVariationThreshold(BLOCK_VARIATION_THRESHOLD);
}

void loop() {
    camera_fb_t *frame = camera.capture();

    // resize image and detect motion
    downscaler.downscale(frame->buf, resized);
    motion.update(resized);
    motion.detect();

    if (motion.ratio() > MOTION_THRESHOLD) {
        Serial.println("Motion detected");

        // here we want to save the image to disk
    }
}

Save image to disk

Fine, we can detect motion!

Now we want to save the triggering image to disk in a format that we can decode without any custom software. It would be cool if we could see the image using the native Esp32 Filesystem Browser sketch.

Thankfully to the guys at espressif, the esp32 is able to encode a raw image to JPEG format: it is convenient to use (any PC on earth can read a jpeg) and it is also fast.

and thanks to the reader ankaiser for pointing it out

It's really easy to do thanks to the EloquentVision library.

if (motion.ratio() > MOTION_THRESHOLD) {
        Serial.println("Motion detected");

        // quality ranges from 10 to 64 -> the higher, the more detailed
        uint8_t quality = 30;
        JpegWriter<SOURCE_WIDTH, SOURCE_HEIGHT> jpegWriter;
        File imageFile = SPIFFS.open("/capture.jpg", "wb");

        // it takes < 1 second for a 320x240 image and 4 Kb of space
        jpegWriter.writeGrayscale(imageFile, frame->buf, quality);
        imageFile.close();
}

Well done! Now your image is on the disk and can be downloaded with the FSBrowser sketch.

Now you have all the tools you need to create your own DIY surveillance camera with motion detection feature!

You can use it to catch thieves (I discourage you to rely on such a rudimentary setup however!), to capture images of wild animals in your garden (birds, sqirrels or the like), or any other application you see fit.

Further improvements

Of course you may well understand that a proper motion detection setup should be more complex than the one presented here. Nevertheless, a couple of quick fixes can greatly improve the usability of this project with little effort. Here I suggest you a couple.

#1: Debouncing successive frames: the code presented in this post is a stripped down version of a more complete esp32-cam motion detection example sketch.

That sketch implements a debouncing function to prevent writing "ghost images" (see the original post on motion detection for a clear evidence of this effect).

#2: Proper file naming: the example sketch uses a fixed filename for the image. This means any new image will overwrite the older, which may be undesiderable based on your requirements. A proper way to handle this would be to attach an RTC and name the image after the time it occurred (something like "motion_2020-12-03_08:09:10.bmp")

#3: RGB images: this is something I'm working on. I mean, the Bitmap writer is there (so you could actually use it to store images on your esp32), but the multi-channel motion detection is driving me crazy, I need some more time to design it the way I want, so stay tuned!


I hope you enjoyed this tutorial on esp32-cam motion detection with photo capture: it was born as a response to your asking, so don't be afraid and ask me anything: I will do my best to help you!

L'articolo Esp32-cam motion detection WITH PHOTO CAPTURE! (grayscale version) proviene da Eloquent Arduino Blog.

]]>
Easier, faster pure video ESP32 cam motion detection https://eloquentarduino.com/projects/esp32-arduino-motion-detection Sun, 10 May 2020 19:26:08 +0000 https://eloquentarduino.github.io/?p=1110 If you liked my post about ESP32 cam motion detection, you'll love this updated version: it's easier to use and blazing fast! The post about pure video ESP32 cam motion detection without an external PIR is my most successful post at the moment. Many of you are interested about this topic. One of my readers, […]

L'articolo Easier, faster pure video ESP32 cam motion detection proviene da Eloquent Arduino Blog.

]]>
If you liked my post about ESP32 cam motion detection, you'll love this updated version: it's easier to use and blazing fast!

Faster motion detection

The post about pure video ESP32 cam motion detection without an external PIR is my most successful post at the moment. Many of you are interested about this topic.

One of my readers, though, pointed out my implementation was quite slow and he only achieved bare 5 fps in his project. So he asked for a better alternative.

Since the post was of great interest for many people, I took the time to revisit the code and make improvements.

I came up with a 100% re-writing that is both easier to use and faster. Actually, it is blazing fast!.

Let's see how it works.

Downsampling

In the original post I introduced the idea of downsampling the image from the camera for a faster and more robust motion detection. I wrote the code in the main sketch to keep it self-contained.

Looking back now it was a poor choice, since it cluttered the project and distracted from the main purpose, which is motion detection.

Moreover, I thought that scanning the image buffer in sequential order would be the fastest approach.

It turns out I was wrong.

This time I scan the image buffer following the blocks that will compose the resulting image and the results are... much faster.

Also, I decided to inject some more efficiency that will further speedup the computation: using different strategies for downsampling.

The idea of downsampling is that you have to "collapse" a block of NxN from the original image to just one pixel of the resulting image.

Now, there are a variety of ways you can accomplish this. The first two I present here are the most obvious, the other two are of my "invention": nothing fancy nor new, but they're fast and serve the purpose well.

Nearest neighbor

You can just pick the center of the NxN block and use its value for the output.
Of course it is fast (possibly the fastest approach), but wouldn't be very accurate. One pixel out of NxN wouldn't be representative of the overall region and will heavily suffer from noise.

Nearest diagram

Nearest neighbor block averaging

Full block average

This is the most intuitive alternative: use the average of all the pixels in the block as the ouput value. This is arguabily the "proper" way to do it, since you're using all the pixels in the source image to compute the new one.

Full diagram
Full block averaging

Core block average

As a faster alternative, I thought that averaging only the "core" (the most internal part) of the block would have been a good-enough solution. It has no theoretical proof that this yields true, but our task here is to create a smaller representation of the original image, not producing an accurate smaller version.

Core diagram
Core block averaging

I'll stress this point: the only reason we do downsampling is to compare two sequential frame and detect if they differ above a certain threshold. This downsampling doesn't have to mimic the actual image: it can transform the source in any fancy way, as long as it stays consistent and captures the variations over time.

Cross block average

This time we consider all the pixels along the vertical and horizontal central axes. The idea is that you will capture a good portion of the variation along both the axis, given quite accurate results.

Cross diagram
Cross block averaging

Diagonal block average

This alternative too came to my mind from nowhere, really. I just think it is a good alternative to capture all the block's variation, probably even better than vertical and horizontal directions.

Diagonal diagram
Diagonal block averaging

Implement your own

Not satisfied from the methods above? No problem, you can still implement your own.

The ones presented above are just some algorithms that came to my mind: I'm not telling you they're the best.

They worked for me, that's it.

If you think you found a better solution, I encourage you implement it and even share it with me and the other readers, so we can all make progress on this together.

Finding this content useful?

Benchmarks

So, at the very beginning I said this new implementation is blazingly fast.

How much fast?

As fast as it can be, arguably.

I mean, so fast it won't alter your fps.

Look at the results I got on my M5Stack camera.

Algorithm Time to execute (micros) FPS
None 0 25
Nearest neighbor 160 25
Cross block 700 25
Core block 800 25
Diagonal block 950 25
Full block 4900 12

As you can see, only the full block creates a delay in the process (quite a bit of delay even): the other methods won't slow down your program in any noticeable way.

If you test Nearest neighbor and it works for you, then you'll be extremely light on computation resources with only 160 microseconds of delay.

This is what I mean by blazing fast.

Motion detection

The motion detection part hasn't changed, so I point you to the original post to read more about the Block difference threshold and the Image difference threshold.

Full code

#define CAMERA_MODEL_M5STACK_WIDE
#include "EloquentVision.h"

#define FRAME_SIZE FRAMESIZE_QVGA
#define SOURCE_WIDTH 320
#define SOURCE_HEIGHT 240
#define BLOCK_SIZE 10
#define DEST_WIDTH (SOURCE_WIDTH / BLOCK_SIZE)
#define DEST_HEIGHT (SOURCE_HEIGHT / BLOCK_SIZE)
#define BLOCK_DIFF_THRESHOLD 0.2
#define IMAGE_DIFF_THRESHOLD 0.1
#define DEBUG 0

using namespace Eloquent::Vision;

ESP32Camera camera;
uint8_t prevFrame[DEST_WIDTH * DEST_HEIGHT] = { 0 };
uint8_t currentFrame[DEST_WIDTH * DEST_HEIGHT] = { 0 };

// function prototypes
bool motionDetect();
void updateFrame();

/**
 *
 */
void setup() {
    Serial.begin(115200);
    camera.begin(FRAME_SIZE, PIXFORMAT_GRAYSCALE);
}

/**
 *
 */
void loop() {
    /**
     * Algorithm:
     *  1. grab frame
     *  2. compare with previous to detect motion
     *  3. update previous frame
     */

    time_t start = millis();
    camera_fb_t *frame = camera.capture();

    downscaleImage(frame->buf, currentFrame, nearest, SOURCE_WIDTH, SOURCE_HEIGHT, BLOCK_SIZE);

    if (motionDetect()) {
        Serial.print("Motion detected @ ");
        Serial.print(floor(1000.0f / (millis() - start)));
        Serial.println(" FPS");
    }

    updateFrame();
}

/**
 * Compute the number of different blocks
 * If there are enough, then motion happened
 */
bool motionDetect() {
    uint16_t changes = 0;
    const uint16_t blocks = DEST_WIDTH * DEST_HEIGHT;

    for (int y = 0; y < DEST_HEIGHT; y++) {
        for (int x = 0; x < DEST_WIDTH; x++) {
            float current = currentFrame[y * DEST_WIDTH + x];
            float prev = prevFrame[y * DEST_WIDTH + x];
            float delta = abs(current - prev) / prev;

            if (delta >= BLOCK_DIFF_THRESHOLD)
                changes += 1;
        }
    }

    return (1.0 * changes / blocks) > IMAGE_DIFF_THRESHOLD;
}

/**
 * Copy current frame to previous
 */
void updateFrame() {
    memcpy(prevFrame, currentFrame, DEST_WIDTH * DEST_HEIGHT);
}

Check the full project code on Github and remember to star!

Finding this content useful?

L'articolo Easier, faster pure video ESP32 cam motion detection proviene da Eloquent Arduino Blog.

]]>
Easy Arduino thermal camera with (ASCII) video streaming https://eloquentarduino.github.io/2020/02/easy-arduino-thermal-camera-with-ascii-video-streaming/ Sat, 29 Feb 2020 16:20:15 +0000 https://eloquentarduino.github.io/?p=956 Ever wanted to use your thermal camera with Arduino but found it difficult to go beyond the tutorials code? Let's see the easiest possible way to view your thermal camera streaming without an LCD display! MLX90640 thermal camera For Arduino there are essentially two thermal camera available: the AMG8833 and the MLX90640. The AMG8833 is […]

L'articolo Easy Arduino thermal camera with (ASCII) video streaming proviene da Eloquent Arduino Blog.

]]>
Ever wanted to use your thermal camera with Arduino but found it difficult to go beyond the tutorials code? Let's see the easiest possible way to view your thermal camera streaming without an LCD display!

Arduino thermal image rgb vs ascii

MLX90640 thermal camera

For Arduino there are essentially two thermal camera available: the AMG8833 and the MLX90640.

The AMG8833 is 8x8 and the MLX90640 is 32x24.

They're not cheap, it is true.

But if you have to spend money, I strongly advise you to buy the MLX90640: I have one and it's not that accurate. I can't imagine how low definition would be the AMG8833.

If you want to actually get something meaningful from the camera, the AMG8833 won't give you any good results.

Sure, you can do interpolation: interpolation would give you the impression you have a better definition, but you're just "inventing" values you don't actually have.

For demo projects it could be enough. But for any serious application, spend 20$ more and buy an MLX90640.

MLX90640 eloquent library

As you may know if you read my previous posts, I strongly believe in "eloquent" code, that is code that's as easy as possible to read.

How many lines do you think you need to read a MLX90640 camera? Well, not that much in fact.

#include "EloquentMLX90640.h"

using namespace Eloquent::Sensors;

float buffer[768];
MLX90640 camera;

void setup() {
  Serial.begin(115200);

  if (!camera.begin()) {
    Serial.println("Init error");
    delay(50000);
  }
}

void loop() {
  camera.read(buffer);
  delay(3000);
}

If you skip the declaration lines, you only need a begin() and read() call.

That's it.

What begin() does is to run all of the boilerplate code I mentioned earlier (checking the connection and initializing the parameters).

read() populates the buffer you pass as argument with the temperature readings.

From now on, you're free to handle that array as you may like: this is the most flexible way for the library to handle any use-case. It simply does not pose any restriction.

You can find the camera code at the end of the page or on Github.

Printing as ASCII Art

Now that you have this data, you may want to actually "view" it. Well, that's not an easy task as one may hope.

You will need an LCD if you want to create a standalone product. If you have one, it'll be the best, it's a really cute project to build.

Here's a video from Adafruit that showcases even a 3D-printed case.

If you don't have an LCD, though, it is less practical to access your image.

I did this in the past, and it meant creating a Python script reading the serial port every second and updating a plot.
It works, sure, but it's not the most convenient way to handle it.

This is the reason I thought about ASCII art: it is used to draw images in plain text, so you can view them directly in the serial monitor.

Of course they will not be as accurate or representative as RGB images, but can give you an idea of what you're framing in realtime.

I wrote a class to do this. Once imported in your sketch, it is super easy to get it working.

#include "EloquentAsciiArt.h"

using namespace Eloquent::ImageProcessing;

float buffer[768];
uint8_t bufferBytes[768];
MLX90640 camera;
// we need to specify width and height of the image
AsciiArt<32, 24> art(bufferBytes);

void loop() {
  camera.read(buffer);

  // convert float image to uint8
  for (size_t i = 0; i < 768; i++) {
    // assumes readings are in the range 0-40 degrees
    // change as per your need
    bufferBytes[i] = map(buffer[i], 0, 40, 0, 255);
  }

  // print to Serial with a border of 2 characters, to distinguish one image from the next
  art.print(&Serial, 2);
  delay(2000);
}

As you can see, you need to create an AsciiArt object, map the image pixels in the range 0-255 and call the print() method: easy peasy!

You can find the ASCII art generator code at the end of the page or on Github.

Here's the result of the sketch. It's a video of me putting my arms at the top of my head, once at a time, then standing up.

Resize the Serial Monitor as only a single frame at a time is visble to have a "video streaming" effect

Of course the visual effect won't be as impressive as an RGB image, but you can clearly see my figure moving.

The real bad part is the "glitch" you see between each frame when the scrolling happens: this is something I don't know if it's possible to mitigate.


Check the full project code on Github


#pragma once

#include "Wire.h"
#include "MLX90640_API.h"
#include "MLX90640_I2C_Driver.h"

#ifndef TA_SHIFT
//Default shift for MLX90640 in open air
#define TA_SHIFT 8
#endif

namespace Eloquent {
    namespace Sensors {

        enum class MLX90640Status {
            OK,
            NOT_CONNECTED,
            DUMP_ERROR,
            PARAMETER_ERROR,
            FRAME_ERROR
        };

        class MLX90640 {
        public:
            /**
             *
             * @param address
             */
            MLX90640(uint8_t address = 0x33) :
                _address(address),
                _status(MLX90640Status::OK) {

            }

            /**
             *
             * @return
             */
            bool begin() {
                Wire.begin();
                Wire.setClock(400000);

                return isConnected() && loadParams();
            }

            /**
             *
             * @return
             */
            bool read(float result[768]) {
                for (byte x = 0 ; x < 2 ; x++) {
                    uint16_t frame[834];
                    int status = MLX90640_GetFrameData(_address, frame);

                    if (status < 0)
                        return fail(MLX90640Status::FRAME_ERROR);

                    float vdd = MLX90640_GetVdd(frame, &_params);
                    float Ta = MLX90640_GetTa(frame, &_params);
                    float tr = Ta - TA_SHIFT;
                    float emissivity = 0.95;

                    MLX90640_CalculateTo(frame, &_params, emissivity, tr, result);
                }
            }

        protected:
            uint8_t _address;
            paramsMLX90640 _params;
            MLX90640Status _status;

            /**
             * Test if device is connected
             * @return
             */
            bool isConnected() {
                Wire.beginTransmission(_address);

                if (Wire.endTransmission() == 0) {
                    return true;
                }

                return fail(MLX90640Status::NOT_CONNECTED);
            }

            /**
             *
             * @return
             */
            bool loadParams() {
                uint16_t ee[832];
                int status = MLX90640_DumpEE(_address, ee);

                if (status != 0)
                    return fail(MLX90640Status::DUMP_ERROR);

                status = MLX90640_ExtractParameters(ee, &_params);

                if (status != 0)
                    return fail(MLX90640Status::PARAMETER_ERROR);

                return true;
            }

            /**
             * Mark a failure
             * @param status
             * @return
             */
            bool fail(MLX90640Status status) {
                _status = status;

                return false;
            }
        };
    }
}
#pragma once

#include "Stream.h"

namespace Eloquent {
    namespace ImageProcessing {

        /**
         *
         * @tparam width
         * @tparam height
         */
        template<size_t width, size_t height>
        class AsciiArt {
        public:
            AsciiArt(const uint8_t *data) {
                _data = data;
            }

            /**
             * Get pixel at given coordinates
             * @param x
             * @param y
             * @return
             */
            uint8_t at(size_t x, size_t y) {
                return _data[y * width + x];
            }

            /**
             * Print as ASCII art picture
             * @param stream
             */
            void print(Stream *stream, uint8_t frameSize = 0) {
                const char glyphs[] = " .,:;xyYX";
                const uint8_t glyphsCount = 9;

                printAsciiArtHorizontalFrame(stream, frameSize);

                for (size_t y = 0; y < height; y++) {
                    // vertical frame
                    for (uint8_t k = 0; k < frameSize; k++)
                        Serial.print('|');

                    for (size_t x = 0; x < width; x++) {
                        const uint8_t glyph = floor(((uint16_t) at(x, y)) * glyphsCount / 256);

                        stream->print(glyphs[glyph]);
                    }

                    // vertical frame
                    for (uint8_t k = 0; k < frameSize; k++)
                        Serial.print('|');

                    stream->print('\n');
                }

                printAsciiArtHorizontalFrame(stream, frameSize);
                stream->flush();
            }

        protected:
            const uint8_t *_data;

            /**
             *
             * @param stream
             * @param frameSize
             */
            void printAsciiArtHorizontalFrame(Stream *stream, uint8_t frameSize) {
                for (uint8_t i = 0; i < frameSize; i++) {
                    for (size_t j = 0; j < width + 2 * frameSize; j++)
                        stream->print('-');
                    stream->print('\n');
                }
            }
        };
    }
}

L'articolo Easy Arduino thermal camera with (ASCII) video streaming proviene da Eloquent Arduino Blog.

]]>
Handwritten digit classification with Arduino and MicroML https://eloquentarduino.github.io/2020/02/handwritten-digit-classification-with-arduino-and-microml/ Sun, 23 Feb 2020 10:53:03 +0000 https://eloquentarduino.github.io/?p=931 We continue exploring the endless possibilities on the MicroML (Machine Learning for Microcontrollers) framework on Arduino and ESP32 boards: in this post we're back to image classification. In particular, we'll distinguish handwritten digits using an ESP32 camera. If this is the first time you're reading my blog, you may have missed that I'm on a […]

L'articolo Handwritten digit classification with Arduino and MicroML proviene da Eloquent Arduino Blog.

]]>
We continue exploring the endless possibilities on the MicroML (Machine Learning for Microcontrollers) framework on Arduino and ESP32 boards: in this post we're back to image classification. In particular, we'll distinguish handwritten digits using an ESP32 camera.

Arduino handwritten digit classification

If this is the first time you're reading my blog, you may have missed that I'm on a journey to push the limits of Machine learning on embedded devices like the Arduino boards and ESP32.

I started with accelerometer data classification, then did Wifi indoor positioning as a proof of concept.

In the last weeks, though, I undertook a more difficult path that is image classification.

Image classification is where Convolutional Neural Networks really shine, but I'm here to question this settlement and demostrate that it is possible to come up with much lighter alternatives.

In this post we continue with the examples, replicating a "benchmark" dataset in Machine learning: the handwritten digits classification.

If you are curious about a specific image classification task you would like to see implemented, let me know in the comments: I'm always open to new ideas

The task

The objective of this example is to be able to tell what an handwritten digit is, taking as input a photo from the ESP32 camera.

In particular, we have 3 handwritten numbers and the task of our model will be to distinguish which image is what number.

Handwritten digits example

I only have a single image per digit, but you're free to draw as many samples as you like: it should help improve the performance of you're classifier.

1. Feature extraction

When dealing with images, if you use a CNN this step is often overlooked: CNNs are made on purpose to handle raw pixel values, so you just throw the image in and it is handled properly.

When using other types of classifiers, it could help add a bit of feature engineering to help the classifier doing its job and achieve high accuracy.

But not this time.

I wanted to be as "light" as possible in this demo, so I only took a couple steps during the feature acquisition:

  1. use a grayscale image
  2. downsample to a manageable size
  3. convert it to black/white with a threshold

I would hardly call this feature engineering.

This is an example of the result of this pipeline.

Handwritten digit feature extraction

The code for this pipeline is really simple and is almost the same from the example on motion detection.

#include "esp_camera.h"

#define PWDN_GPIO_NUM     -1
#define RESET_GPIO_NUM    15
#define XCLK_GPIO_NUM     27
#define SIOD_GPIO_NUM     22
#define SIOC_GPIO_NUM     23
#define Y9_GPIO_NUM       19
#define Y8_GPIO_NUM       36
#define Y7_GPIO_NUM       18
#define Y6_GPIO_NUM       39
#define Y5_GPIO_NUM        5
#define Y4_GPIO_NUM       34
#define Y3_GPIO_NUM       35
#define Y2_GPIO_NUM       32
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     26
#define PCLK_GPIO_NUM     21

#define FRAME_SIZE FRAMESIZE_QQVGA
#define WIDTH 160
#define HEIGHT 120
#define BLOCK_SIZE 5
#define W (WIDTH / BLOCK_SIZE)
#define H (HEIGHT / BLOCK_SIZE)
#define THRESHOLD 127

double features[H*W] = { 0 };

void setup() {
    Serial.begin(115200);
    Serial.println(setup_camera(FRAME_SIZE) ? "OK" : "ERR INIT");
    delay(3000);
}

void loop() {
    if (!capture_still()) {
        Serial.println("Failed capture");
        delay(2000);
        return;
    }

    print_features();
    delay(3000);
}

bool setup_camera(framesize_t frameSize) {
    camera_config_t config;

    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    config.pin_d1 = Y3_GPIO_NUM;
    config.pin_d2 = Y4_GPIO_NUM;
    config.pin_d3 = Y5_GPIO_NUM;
    config.pin_d4 = Y6_GPIO_NUM;
    config.pin_d5 = Y7_GPIO_NUM;
    config.pin_d6 = Y8_GPIO_NUM;
    config.pin_d7 = Y9_GPIO_NUM;
    config.pin_xclk = XCLK_GPIO_NUM;
    config.pin_pclk = PCLK_GPIO_NUM;
    config.pin_vsync = VSYNC_GPIO_NUM;
    config.pin_href = HREF_GPIO_NUM;
    config.pin_sscb_sda = SIOD_GPIO_NUM;
    config.pin_sscb_scl = SIOC_GPIO_NUM;
    config.pin_pwdn = PWDN_GPIO_NUM;
    config.pin_reset = RESET_GPIO_NUM;
    config.xclk_freq_hz = 20000000;
    config.pixel_format = PIXFORMAT_GRAYSCALE;
    config.frame_size = frameSize;
    config.jpeg_quality = 12;
    config.fb_count = 1;

    bool ok = esp_camera_init(&config) == ESP_OK;

    sensor_t *sensor = esp_camera_sensor_get();
    sensor->set_framesize(sensor, frameSize);

    return ok;
}

bool capture_still() {
    camera_fb_t *frame = esp_camera_fb_get();

    if (!frame)
        return false;

    // reset all the features
    for (size_t i = 0; i < H * W; i++)
      features[i] = 0;

    // for each pixel, compute the position in the downsampled image
    for (size_t i = 0; i < frame->len; i++) {
      const uint16_t x = i % WIDTH;
      const uint16_t y = floor(i / WIDTH);
      const uint8_t block_x = floor(x / BLOCK_SIZE);
      const uint8_t block_y = floor(y / BLOCK_SIZE);
      const uint16_t j = block_y * W + block_x;

      features[j] += frame->buf[i];
    }

    // apply threshold
    for (size_t i = 0; i < H * W; i++) {
      features[i] = (features[i] / (BLOCK_SIZE * BLOCK_SIZE) > THRESHOLD) ? 1 : 0;
    }

    return true;
}

void print_features() {
    for (size_t i = 0; i < H * W; i++) {
        Serial.print(features[i]);

        if (i != H * W - 1)
          Serial.print(',');
    }

    Serial.println();
}

2. Samples recording

To create your own dataset, you need a collection of handwritten digits.

You can do this part as you like, by using pieces of paper or a monitor. I used a tablet because it was well illuminated and I could open a bunch of tabs to keep a record of my samples.

As in the apple vs orange, keep in mind that you should be consistent during both the training phase and the inference phase.

This is why I used tape to fix my ESP32 camera to the desk and kept the tablet in the exact same position.

If you desire, you could experiment varying slightly the capturing setup during the training and see if your classifier still achieves good accuracy: this is a test I didn't make.

3. Train and export the classifier

For a detailed guide refer to the tutorial

from sklearn.ensemble import RandomForestClassifier
from micromlgen import port

# put your samples in the dataset folder
# one class per file
# one feature vector per line, in CSV format
features, classmap = load_features('dataset/')
X, y = features[:, :-1], features[:, -1]
classifier = RandomForestClassifier(n_estimators=30, max_depth=10).fit(X, y)
c_code = port(classifier, classmap=classmap)
print(c_code)

At this point you have to copy the printed code and import it in your Arduino project, in a file called model.h.

4. The result

Okay, at this point you should have all the working pieces to do handwritten digit image classification on your ESP32 camera. Include your model in the sketch and run the classification.

#include "model.h"

void loop() {
    if (!capture_still()) {
        Serial.println("Failed capture");
        delay(2000);

        return;
    }

    Serial.print("Number: ");
    Serial.println(classIdxToName(predict(features)));
    delay(3000);
}

Done.

You can see a demo of my results in the video below.

Project figures

My dataset is composed of 25 training samples in total and the SVM with linear kernel produced 17 support vectors.

On my M5Stick camera board, the overhead for the model is 6.8 Kb of flash and the inference takes 7ms: not that bad!


Check the full project code on Github

L'articolo Handwritten digit classification with Arduino and MicroML proviene da Eloquent Arduino Blog.

]]>
Apple or Orange? Image recognition with ESP32 and Arduino https://eloquentarduino.github.io/2020/01/image-recognition-with-esp32-and-arduino/ Sun, 12 Jan 2020 10:32:08 +0000 https://eloquentarduino.github.io/?p=820 Do you have an ESP32 camera? Want to do image recognition directly on your ESP32, without a PC? In this post we'll look into a very basic image recognition task: distinguish apples from oranges with machine learning. Image recognition is a very hot topic these days in the AI/ML landscape. Convolutional Neural Networks really shines […]

L'articolo Apple or Orange? Image recognition with ESP32 and Arduino proviene da Eloquent Arduino Blog.

]]>
Do you have an ESP32 camera?

Want to do image recognition directly on your ESP32, without a PC?

In this post we'll look into a very basic image recognition task: distinguish apples from oranges with machine learning.

Apple vs Orange

Image recognition is a very hot topic these days in the AI/ML landscape. Convolutional Neural Networks really shines in this task and can achieve almost perfect accuracy on many scenarios.

Sadly, you can't run CNN on your ESP32, they're just too large for a microcontroller.

Since in this series about Machine Learning on Microcontrollers we're exploring the potential of Support Vector Machines (SVMs) at solving different classification tasks, we'll take a look into image classification too.

What we're going to do

In a previous post about color identification with Machine learning, we used an Arduino to detect the object we were pointing at with a color sensor (TCS3200) by its color: if we detected yellow, for example, we knew we had a banana in front of us.

Of course such a process is not object recognition at all: yellow may be a banane, or a lemon, or an apple.

Object inference, in that case, works only if you have exactly one object for a given color.

The objective of this post, instead, is to investigate if we can use the MicroML framework to do simple image recognition on the images from an ESP32 camera.

This is much more similar to the tasks you do on your PC with CNN or any other form of NN you are comfortable with. Sure, we will still apply some restrictions to fit the problem on a microcontroller, but this is a huge step forward compared to the simple color identification.

In this context, image recognition means deciding which class (from the trained ones) the current image belongs to. This algorithm can't locate interesting objects in the image, neither detect if an object is present in the frame. It will classify the current image based on the samples recorded during training.

As any beginning machine learning project about image classification worth of respect, our task will be to distinguish an orange from an apple.

Features definition

I have to admit that I rarely use NN, so I may be wrong here, but from the examples I read online it looks to me that features engineering is not a fundamental task with NN.

Those few times I used CNN, I always used the whole image as input, as-is. I didn't extracted any feature from them (e.g. color histogram): the CNN worked perfectly fine with raw images.

I don't think this will work best with SVM, but in this first post we're starting as simple as possible, so we'll be using the RGB components of the image as our features. In a future post, we'll introduce additional features to try to improve our results.

I said we're using the RGB components of the image. But not all of them.

Even at the lowest resolution of 160x120 pixels, a raw RGB image from the camera would generate 160x120x3 = 57600 features: way too much.

We need to reduce this number to the bare minimum.

How much pixels do you think are necessary to get reasonable results in this task of classifying apples from oranges?

You would be surprised to know that I got 90% accuracy with an RGB image of 8x6!

You actually need very few pixels to do image classification

Yes, that's all we really need to do a good enough classification.


You can distinguish apples from oranges on ESP32 with 8x6 pixels only!
Click To Tweet


Of course this is a tradeoff: you can't expect to achieve 99% accuracy while mantaining the model size small enough to fit on a microcontroller. 90% is an acceptable accuracy for me in this context.

You have to keep in mind, moreover, that the features vector size grows quadratically with the image size (if you keep the aspect ratio). A raw RGB image of 8x6 generates 144 features: an image of 16x12 generates 576 features. This was already causing random crashes on my ESP32.

So we'll stick to 8x6 images.

Now, how do you compact a 160x120 image to 8x6? With downsampling.

This is the same tecnique we've used in the post about motion detection on ESP32: we define a block size and average all the pixels inside the block to get a single value (you can refer to that post for more details).

Image downsampling example

This time, though, we're working with RGB images instead of grayscale, so we'll repeat the exact same process 3 times, one for each channel.

This is the code excerpt that does the downsampling.

uint16_t rgb_frame[HEIGHT / BLOCK_SIZE][WIDTH / BLOCK_SIZE][3] = { 0 };

void grab_image() {
    for (size_t i = 0; i < len; i += 2) {
        // get r, g, b from the buffer
        // see later

        const size_t j = i / 2;
        // transform x, y in the original image to x, y in the downsampled image
        // by dividing by BLOCK_SIZE
        const uint16_t x = j % WIDTH;
        const uint16_t y = floor(j / WIDTH);
        const uint8_t block_x = floor(x / BLOCK_SIZE);
        const uint8_t block_y = floor(y / BLOCK_SIZE);

        // average pixels in block (accumulate)
        rgb_frame[block_y][block_x][0] += r;
        rgb_frame[block_y][block_x][1] += g;
        rgb_frame[block_y][block_x][2] += b;
    }
}

Finding this content useful?

Extracting RGB components

The ESP32 camera can store the image in different formats (of our interest — there are a couple more available):

  1. grayscale: no color information, just the intensity is stored. The buffer has size HEIGHT*WIDTH
  2. RGB565: stores each RGB pixel in two bytes, with 5 bit for red, 6 for green and 5 for blue. The buffer has size HEIGHT * WIDTH * 2
  3. JPEG: encodes (in hardware?) the image to jpeg. The buffer has a variable length, based on the encoding results

For our purpose, we'll use the RGB565 format and extract the 3 components from the 2 bytes with the following code.

taken from https://www.theimagingsource.com/support/documentation/ic-imaging-control-cpp/PixelformatRGB565.htm

config.pixel_format = PIXFORMAT_RGB565;

for (size_t i = 0; i < len; i += 2) {
    const uint8_t high = buf[i];
    const uint8_t low  = buf[i+1];
    const uint16_t pixel = (high << 8) | low;

    const uint8_t r = (pixel & 0b1111100000000000) >> 11;
    const uint8_t g = (pixel & 0b0000011111100000) >> 6;
    const uint8_t b = (pixel & 0b0000000000011111);
}

Record samples image

Now that we can grab the images from the camera, we'll need to take a few samples of each object we want to racognize.

Before doing so, we'll linearize the image matrix to a 1-dimensional vector, because that's what our prediction function expects.

#define H (HEIGHT / BLOCK_SIZE)
#define W (WIDTH / BLOCK_SIZE)

void linearize_features() {
  size_t i = 0;
  double features[H*W*3] = {0};

  for (int y = 0; y < H; y++) {
    for (int x = 0; x < W; x++) {
      features[i++] = rgb_frame[y][x][0];
      features[i++] = rgb_frame[y][x][1];
      features[i++] = rgb_frame[y][x][2];
    }
  }

  // print to serial
  for (size_t i = 0; i < H*W*3; i++) {
    Serial.print(features[i]);
    Serial.print('\t');
  }

  Serial.println();
}

Now you can setup your acquisition environment and take the samples: 15-20 of each object will do the job.

Image acquisition is a very noisy process: even keeping the camera still, you will get fluctuating values.
You need to be very accurate during this phase if you want to achieve good results.
I suggest you immobilize your camera with tape to a flat surface or use some kind of photographic easel.

Training the classifier

To train the classifier, save the features for each object in a file, one features vector per line. Then follow the steps on how to train a ML classifier for Arduino to get the exported model.

You can experiment with different classifier configurations.

My features were well distinguishable, so I had great results (100% accuracy) with any kernel (even linear).

One odd thing happened with the RBF kernel: I had to use an extremely low gamma value (0.0000001). Does anyone can explain me why? I usually go with a default value of 0.001.

The model produced 13 support vectors.

I did no features scaling: you could try it if classifying more than 2 classes and having poor results.

Apple vs Orange decision boundaries

Real world example

If you followed all the steps above, you should now have a model capable of detecting if your camera is shotting an apple or an orange, as you can see in the following video.

The little white object you see at the bottom of the image is the camera, taped to the desk.

Did you think it was possible to do simple image classification on your ESP32?

Disclaimer

This is not full-fledged object recognition: it can't label objects while you walk as Tensorflow can do, for example.

You have to carefully craft your setup and be as consistent as possible between training and inferencing.

Still, I think this is a fun proof-of-concept that can have useful applications in simple scenarios where you can live with a fixed camera and don't want to use a full Raspberry Pi.

In the next weeks I settled to finally try TensorFlow Lite for Microcontrollers on my ESP32, so I'll try to do a comparison between them and this example and report my results.

Now that you can do image classification on your ESP32, can you think of a use case you will be able to apply this code to?

Let me know in the comments, we could even try realize it together if you need some help.


Check the full project code on Github

L'articolo Apple or Orange? Image recognition with ESP32 and Arduino proviene da Eloquent Arduino Blog.

]]>
Motion detection with ESP32 cam only (Arduino version) https://eloquentarduino.com/projects/esp32-arduino-motion-detection Sun, 05 Jan 2020 11:08:08 +0000 https://eloquentarduino.github.io/?p=779 Do you have an ESP32 camera? Do you want to do motion detection WITHOUT ANY external hardware? Here's a tutorial made just for you: 30 lines of code and you will know when something changes in your video stream 🎥 ** See the updated version of this project: it's easier to use and waaay faster: […]

L'articolo Motion detection with ESP32 cam only (Arduino version) proviene da Eloquent Arduino Blog.

]]>
Do you have an ESP32 camera? Do you want to do motion detection WITHOUT ANY external hardware?

Here's a tutorial made just for you: 30 lines of code and you will know when something changes in your video stream 🎥

ESP32 camera motion detection example

** See the updated version of this project: it's easier to use and waaay faster: Easier, faster, pure video ESP32 cam motion detection **

What is (naive) motion detection?

Quoting from Wikipedia

Motion detection is the process of detecting a change in the position of an object relative to its surroundings or a change in the surroundings relative to an object

In this project, we're implementing what I call naive motion detection: that is, we're not focusing on a particular object and following its motion.

We'll only detect if any considerable portion of the image changed from one frame to the next.

We won't identify the location of motion (that's the subject for a next project), neither what caused it. We will analyze video stream in (almost) real-time and compare frame by frame: if lots of pixels changed, we'll call it motion.

Can't I use an external PIR?

Several projects on the internet about motion detection with an ESP32 cam use an external PIR sensor to trigger the video recording.

What's the problem with that approach?

1. External hardware

First of all, you need external hardware. If you're using a breadboard, no problem, you just need a couple more wires and you're good to go. But I have a nice M5stick camera (no affiliate link), that's already well packaged, so it won't be that easy to add a PIR sensor.

2. Field of View

PIR sensors have a limited FOV (field of view), so you will need more than one to cover the whole range of the camera.

My camera, for example, has fish-eye lens which give me 160° of view. Most cheap PIR sensors have a 120° field of view, so one will not suffice. This adds even more space to my project.

3. Cold objects

PIR sensors gets triggered by infrared light. Infrared light gets emitted by hot bodies (like people and animals).

But motion in a video stream can happen for a variety of reasons, not necessarily due to hot bodies, for example if you want to monitor a street for cars passing by.

A PIR sensor can't do this: video motion detection can.


ESP32 cam pure video motion detection can detect motion due to cold objects
Click To Tweet


Do you like the motion effect at the beginning of the post? Check it out on Github

What do you need?

All you need for this project is a board with a camera sensor. As I said, I have a M5Stick Camera with fish-eye lens, but any ESP32 based camera should work out of the box:

  • ESP32 cam
  • ESP32 eye
  • TTGO camera
  • ... any other flavor of ESP32 camera

ESP32 camera models

How does it work?

Ok, let's go to the "technical" stuff.

Simply put, the algorithm counts the number of different pixels from one frame to the next: if many pixels changed, it will detect motion.

Well, it's almost like this.

Of course such an algorithm will be very sensitive to noise (which is quite high on these low-cost cameras). We need to mitigate false-positive triggers.

Downsampling

One super-simple and super-effective way of doing this is to work with blocks, instead of pixels. A block is simply an N x N square, whose value is the average of the pixels it contains.

This greatly reduces sensitivity to noise, providing a more robust detection. Here's an example of what the the "block-ing" operation does to an image.

Image downsampling example

It's really a "pixelating" effect: you take the orginal image (let's say 320x240 pixels) and resize it to 10x smaller, 32x24.

This has the added benefit that it's much more lightweight to work with 32x24 matrix instead of 320x240 matrix: if you want to do real-time detection, this is a MUST.

How should you choose the scale factor?

Well, it depends.

It depends on the sensitivity you want to achieve. The higher the downsampling, the less sensitive your detection will be.

If you want to detect a person passing 50cm away from the camera, you can increase this number without any problem. If you want to detect a dog 10m away, you should keep it in the 5-10 range.

Experiment with your own use case a tweak with trial-and-error.

Blocks difference threshold

Once we've defined the block size, we need to detect if a block changed from one frame to the next.

Of course, just testing for difference (current != prev) would be again too sensitive to noise. A block can change for a variety of reasons, the first of which is the bad camera quality.

So we instead define a percent threshold above which we can say for sure the block actually changed. A good starting point could be 10-20%, but again you need to tweak this to your needs.

The higher the threshold, the less sensitive the algorithm will be.

In code it is calculated as

float delta = abs(currentBlockValue - prevBlockValue) / prevBlockValue;

which indicates the relative increment/decrement from the previous value.

Image difference threshold

Now that we can detect if a block changed from one frame to the next, we can actually detect if the image changed.

You could decide to trigger motion even if a single block changed, but I suggest you to set an higher value here.

Let's return to the 320x240 image example. With a 10x10 block, you'll be working with 32x24 = 768 blocks: will you call it "motion" if 1 out of 768 blocks changed value?

I don't think so. You want something more robust. You want 50 blocks to change. Or at least 20 blocks. If you do the math, 20 blocks out of 768 is only the 2.5% of change, which is hardly noticeable.

If you want to be robust, don't set this threshold to a too low value. Again, tweak with real world experimenting.

In code it is calculated as:

float changedBlocksPercent = changedBlocks / totalBlocks

Combining all together

Recapping: when running the motion detection algorithm you have 3 parameters to set:

  1. the block size
  2. the block difference threshold
  3. the image differerence threshold

Let's pick 3 sensible defaults: block size = 10, block threshold = 15%, image threshold = 20%.

What does these parameters translate to in the practice?

They mean that motion will be detected if 20% of the image, averaged in blocks of 10x10, changed its value by at least 15% from one frame to the next.

ESP32 camera motion example

As you can see, you don't need high-definition images to (naively) detect if something happened to the image. Large area of motion will be easily detectable, even at very low resolution.

Real world example

Now the fun part. I'll show you how it performs on a real-world scenario.

To keep it simple, I wrote a sketch that does only motion detection, not video streaming over HTTP.

This means you won't be able to see the original image recorded from the camera. Nevertheless, I have kept the block size to a minimum to allow for the best quality possible.

This is me passing my arm in front of the camera a few times.

The grid you see represents the actual pixels used for the computation. Each cell corresponds to one pixel of the downscaled image.

The orange cells highlight the pixels that the algorithm sees as "different" from one frame to the next. As you can see, some pixels are detected even if no motion is happening. That's the noise I talked about multiple times during the post.

When I move my arm in the frame, you see lots of pixels become activated, so the "Motion" text appears.

While moving the arm, you may notice what I call the "ghost" effect. You actually see 2 regions of motion: one is where my arm is now, which of course changed. The other is the region where my arm was in the previous frame, which returned to its original content.

This is why I suggest you keep the image difference threshold to a high value: if some real motion happens, you will notice it for sure because the activated region of the image will be actually bigger than the actual object moving.

Do you like the grid effect of the sample video? Let me know in the comment if you want me to share it.

Or even better: subscribe to the newsletter I you will get it directly in your inbox with my next mail.

Finding this content useful?


Check the full project code on Github

Check out also the gist for the visualization tool

L'articolo Motion detection with ESP32 cam only (Arduino version) proviene da Eloquent Arduino Blog.

]]>