And oh, first blogpost in more than two years I guess.

This might be the right time to remind any readers that this project is likely to end up with the dozens of projects that get 3-5 commits before being abandoned. But hey, let's be optimistic, and get yourself ready for a deep dive into my pipe dreams made of naive assumptions about the performance of Reinforcement Learning, and cheap repurposed robots.


Build a mini robot car that can learn what it takes to reach a goal, without any programming, using reinforcement learning. This might involve exploring a large room to find an objective, going around obstacles, avoiding traps, etc.
Think of it like a real-life gridworld environment.



This mini robot will likely be based on the Sparki, a bot made for educational purposes, but that has the advantage of having precise movement (calibrated stepper motors) which I learned, in a previous robotics project I did with some friends, was vital to build any type of model (noise on sensors is OK, very large noise on movement is not). This robot also includes a whole variety of sensors: ultrasound, accelerometer, etc.

I will also add Raspberry Pi Zero W with a camera, on the front of the robot. This will allow for streaming video and sensor data back to an external client (via WiFi), and communicate instructions from the client to the robot.


The main board on the robot will only be responsible of performing the low level tasks it is asked to: "move forward by 10mm", "turn by 15 degrees", etc. This is similar to what I have done in my last project, and one main reason to keep the embedded code as "dumb" as possible is that it is much easier to iterate if you write logic client-side (no need to re-program the robot, upload the code to the card...).

These instructions will be coming from a client (a web UI at first, an ML model later) in RPC fashion, most likely in protobuf or JSON, and passed to the RPi before being transmitted to the board via serial connection.

To summarise: - Clients will send instructions to the RPi in some RPC-like fashion, and will periodically receive sensor data from it. - The RPi will mostly serve as a proxy for passing instructions to the main board, but will
also capture photo/video with its camera, - Sparki's board will interpret instructions and perform them, and periodically write sensor measurements to be transmitted back to clients by the RPi.

Code-wise, the frontend will likely be in React (which I never used before!), the backend for the web UI as well as a server communicating with the RPi will be implemented in Python. Likewise, the server on the RPi side of things will also be written in Python, and the glue code in the Sparki board will be in C++.

Machine learning

Hah! Thanks for being optimistic enough to believe it will ever get to this point.

Here is how one could model this task, through the Reinforcement Learning framework:

  • State: Sensor data from the last K steps. (accelerometer, ultrasound sensor, camera footage, ...).
  • Actions: forward_10mm, backwards_10mm, [...], turn_right_5, turn_right_10, [...], turn_left_5, turn_left_10, ...
  • Reward: Distance remaining between the car and its goal, or a related metric.

The first two are obtained/performed from the car, the last can be obtained by putting a camera above the scene: capturing the robot + its goal, each in separate colours to easily track the remaining distance between the car and its goal. This might also be helpful to place the car back to the start position at the end of each episode.

Now, using Deep Reinforcement Learning (would most likely try Double-DQN model first), one would hope the robot could learn the right sequence of action needed to go around an obstacle, and learn that exploring the room to get sights on the goal might be better than simply bumping around an obstacle endlessly.

Given the large state space (esp. if camera footage is used), the training data collected from the robot might not be enough, in which case transfer learning might be considered (learning a decent policy via simulation, then training it further on the real robot).


<Place another reminder about how this whole thing might end up abandonned.>

I ordered all of the hardware needed to start experimenting with this, and should receive it next week. I'll start right away by implementing the communication between all the pieces, implement remote control of the car, and other basic pieces needed before moving to any machine learning.

I might start by building a simulated environment to test the viability of an RL model and/or start with an easier task (for example, supply a map + use Kalman Filters to locate the car, then send instructions to reach the destination, as was the goal in my previous project).

You can track the progress on the project's Github repository, Miotono. (the name is a terrible word-play on my last project's name, Autonomee, which was already bad enough)