TinyML: Machine Learning for Embedded System — Part I

Building the model — How to build an intelligent system with Arduino and TensorFlow

9 min readJul 30, 2021

Knowledge is having the right answer, Intelligence is asking the right question.

We are increasingly surrounded by small and intelligent objects: smart speakers, smart watches, smart bands, …
These “things” belong to the Internet Of Things (IOT) class: non-standard computing devices able to wirelessly connect to a network and have the ability to elaborate and transmit data.

These devices are getting smarter, more connected and performing more demanding jobs on it is getting more and more needed: for example, machine learning tasks.

Tiny Machine Learning (TinyML) is a field at the intersection of embedded IOT systems and machine learning.

TinyML: Machine Learning + Embedded System

Machine Learning at the edge

TensorFlow is a free and open-source software library for machine learning.
The optimized and lightweight version of TensorFlow library for mobile and embedded devices is called TensorFlow Lite.

TensorFlow Lite (TFLite) is designed to make easy to perform machine learning (in particular, deep learning tasks) on devices “at the edge” of the network, instead of sending data back and forth from a server.
It enables machine learning inference with low latency and a small program size on devices with limited compute and memory resources.
It works with a huge range of devices: from low-power microcontrollers (Arduino) to powerful mobile phones (Android, iOS).

The TFLite workflow

The TensorFlow Lite workflow involves the following steps:

Train the model
A TensorFlow model is a data structure that contains the logic and knowledge of a machine learning network trained to solve a particular problem.
You cannot build a model using TensorFlow Lite. You must convert a regular TensorFlow model into the Lite format.
If your model is trained for microcontroller, you have to generate a small TensorFlow model that can fit your tiny device with some limitations. TensorFlow Lite for Microcontrollers: it is designed for the specific constraints of microcontroller development.
If you are working on more powerful devices (e.g., Raspberry Pi with Embedded Linux) the standard TensorFlow Lite framework might be easier to integrate.
Convert and optimize the model
Convert the TensorFlow model in TensorFlow Lite format using TensorFlow Lite converter.
Converting models reduces size and introduces optimizations that do not affect accuracy, with some trade-offs.
Deploy and run the model
Deploy and run your model on-device with the interpreter.

Inference: let’s do predictions!

Inference is the process of running data through a model to obtain predictions. It requires a model, an interpreter, and input data.

TensorFlow Lite is composed by two main components:

Converter: converts TensorFlow models into an efficient form used by the interpreter.
Interpreter: runs optimized models (aka, run inference) on mobile and embedded devices.
Interpreter can be configured to run on: mobile platforms (Android, iOS), Embedded Linux and microcontrollers.

(Machine) Learning-by-doing

To better understand the principles of TensorFlow Lite and Machine Learning inferencing, we build a simple model able to predict the trend of an unknown mathematical function basely only on input-output dataset.

The goal of this project is to train a model that take a value x and predict the result y, where y is the output of the mathematical function:
y = sin(x) * sqrt(x)

In a real-world application, if you need the y, you could just calculate it directly with the function sin(x) * sqrt(x). But in this case, you don't know a priori the mathematical expression, but you have only the input (x) and output (y) pairs.

The first part of project will be to train a model capable to approximate the result y given as training dataset a "noisy" set compound by (x,y) pairs.

The second part will be to run this model on a tiny hardware device (Arduino).

Need a powerful machine? No, Colab!

Machine learning training needs high computational power to run on your computer.
So how do you build large machine learning models without “burning” your laptop? Google Colaboratory is the answer!
Google Colab allows you to run TensorFlow training in a web browser using Python language:

Why Python?
Python offers concise and readable code, which makes it easier to build models for machine learning.
Web browser
Colab executes machine learning models on Google cloud servers, which means you can use powerful hardware just typing Python code on a web browser.

Get ready to train

Before starting to build your first machine learning model, you need to:

Create your first Google Colab notebook
Setup your machine
Install python libraries to use TensorFlow.

!pip install tensorflow==2.0.0-rc1

Prepare your dataset
Create a noisy dataset that represents the trend of the unknown mathematical function.
In the sample below the code generates 5000 values that represent random points along the sin(x) * sqrt(x) function.
In a real-world situation, we might be collecting data from sensors (e.g., climatic temperature measurements during a week).

SAMPLES = 5000
x_values = np.random.uniform(low=0, high=15, size=SAMPLES)
y_values = np.sin(x_values)*np.sqrt(x_values)
y_valuse_orig = y_values.copy()y_values += 0.5 * np.random.randn(*y_values.shape)

Training, validation, and test dataset

Dataset is divided in three groups:

Training set: to train the model [60%]
Validation set: to measure model performance during training [20%]
Test set: to test the model after training [20%]

This approach is called cross-validation.

TRAIN_SPLIT = int(0.6 * SAMPLES)
TEST_SPLIT = int(0.2 * SAMPLES + TRAIN_SPLIT)
x_train, x_validate, x_test =
np.split(x_values, [TRAIN_SPLIT, TEST_SPLIT])
y_train, y_validate, y_test = 
np.split(y_values, [TRAIN_SPLIT, TEST_SPLIT])

Let’s start to train!

TensorFlow Lite is designed to perform deep learning tasks: a type of machine learning based on artificial neural networks.

An artificial neural network (ANN) is a mechanism which mimics how a human brain learns. It is an interconnected group of nodes, inspired by a simplification of neurons in a brain.

The simplest type of ANN is the sequential model: a linear stack of layers where each layer has exactly one input tensor and one output tensor.

A tensor is the mathematical representation of a neuron.

TensorFlow uses the keras open-source library to build neural networks in python. Below, the code to initialize a simple sequential model:

from tensorflow.keras import layers
import tensorflow as tf# Initialization of ANN model
model = tf.keras.Sequential()

To build an ANN there are two factors to be consider:

How many layers?
Input and output layers are already defined given that the project is a single-input (x) single-output (y) problem.
So, the problem here is how many hidden layers to use.
Hidden layers reside in-between input and output layers.
Typically, 2 layers are enough. Problems that require more than two hidden layers are rarely encountered.
How many neurons per layer?
Deciding the number of hidden layers is only a small part of the problem. You also need to determine how many neurons will be present in each of these layers.
Using too few neurons in the hidden layers will result in underfitting.
On the other hand, using too many may result in overfitting and an increase in training time. Usually, a “trial and error” method is used.

For this project, you develop a sequential ANN with 4 layers:

1 input layer (1 node)
2 hidden layers (16 nodes each)
1 output layer (1 node)

# Add 1 input layer and an hidden layer with 16 units to the model
model.add(layers.Dense(16, activation=’relu’, input_shape=(1,)))# Add another hidden layer with 16 units 
model.add(layers.Dense(16, activation=’relu’))# Add an output layer with 1 output unit
model.add(layers.Dense(1))

Single-Input Single-Output ANN with 2 hidden layers

For hidden neurons you have to define an activation function:
activation='relu'
It helps the network to use the important information and suppress the irrelevant ones. Like the brain when it is fed with a lot of information. (Popular activation functions are: Binary Step, Linear, ReLU, etc.)

Once the model is created, you have to:

Configure the model

model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

Before training the model, You need to configure it by defining:

*Loss function: a mathematical way of measuring how wrong predictions are, during model’s training (E.g., 'mse' Mean Squared Error (MSE) is the most commonly used loss function for regression).

*Optimizer: it changes the model’s parameters during training trying to minimize the loss function, and make predictions as correct as possible (E.g., 'rmsprop' Root Mean Square Propagation).

*Metrics: a mathematical way of measuring the performance of the model. Like the loss functions, except that the results are not used for training (E.g., 'mae' Mean Absolute Error).

Train the model

model.fit(x_train, y_train, epochs=500, batch_size=16, validation_data=(x_validate, y_validate))

Train the model for a fixed number of iterations (epochs) on the training dataset (x_train, y_train):

*epochs: the number of times that the algorithm will work through the entire training dataset. The number is traditionally large, allowing the algorithm to run until the error has been sufficiently minimized. In the literature it is set to 10, 100, 500, 1000, and larger.

*batch_size: the number of samples that will be propagated through the network during training. With batch_size=16 algorithm takes the first 16 samples from dataset and trains the network. Next, it takes the second 16 samples and trains again.
Pros: Networks train faster with mini-batches.
Cons: The smaller the batch the less accurate will be the estimation.

*validation_data: Data on which to evaluate the loss and metrics at the end of each epoch. The model will not be trained on this data.

Use the model to do predictions on test dataset.

predictions = model.predict(x_test)

How good is your model?

The training procedure (model.fit(...)) makes multiple passes, called Epoch. At the end of each epoch the performance of the net is evaluated against the training and validation set (loss: Loss evaluation on training dataset, val_loss: Validation dataset loss).

Below, an example of model.fit() console output:

Train on 3000 samples, validate on 1000 samples
Epoch 1/1000
3000/3000 [==============================] - 
1s 389us/sample - loss: 4.0990 - mae: 1.6928 - val_loss: 4.2646 - val_mae: 1.7471
Epoch 2/1000
3000/3000 [==============================] - 
0s 144us/sample - loss: 3.9574 - mae: 1.6658 - val_loss: 4.2061 - val_mae: 1.7287
...
Epoch 1000/1000
3000/3000 [==============================] - 
1s 207us/sample - loss: 0.3102 - mae: 0.4438 - val_loss: 0.3989 - val_mae: 0.5040

An approach to rate the goodness of your model could be to evaluate the trend of val_loss and loss during training procedure.

In the figure below, the losses’ trend of our model over training steps:

From this graph, it is clear that:

Losses decrease with epochs increasing.
After a certain number of epochs, the error no longer decreases and therefore increase epochs has no effect on model’s goodness.

And, then? Let’s play with Arduino!

Now that the model is ready and has reached a good level of accuracy, it is time to run inference on the “edge”.

In the Part II, you will convert the TensorFlow model, described in this tutorial, in a Lite version ready to deploy on the Arduino board.