Flying Smarter and Safer: Real-Time Reinforcement Learning for Collision-Avoidant Drones

02/10/2025

Researchers at the Information Processing and Telecommunications Center (IPTC), Universidad Politécnica de Madrid, have developed and validated a novel drone controller that combines reinforcement learning (RL) with neural networks to achieve safe, autonomous flight in real time. The work addresses a key challenge in aerial robotics: how to deploy machine learning methods in safety-critical embedded systems where strict timing and reliability requirements must be met.

The team trained an unmanned aerial vehicle (UAV) to maintain altitude and avoid dynamic obstacles—specifically, another UAV programmed to collide—within a constrained three-dimensional “critical zone.” Training took place in Microsoft’s AirSim simulator using the Soft Actor-Critic (SAC) algorithm, which supports continuous action spaces and promotes robust exploration. After 1.2 million training steps, the RL agent demonstrated a 91 % success rate in test flights, including scenarios never encountered during training, highlighting its capacity to generalize and adapt.

To verify real-time feasibility, the researchers implemented the trained neural network on a Xilinx Ultrascale+ ZCU104 board through a Processor-in-the-Loop setup. Running on the XtratuM hypervisor, the controller met stringent 200 ms control-period deadlines while maintaining stable memory and processor usage. These results indicate that reinforcement learning can be safely executed in embedded aerospace environments, though further verification aligned with aviation software standards (e.g., DO-178C) is planned.

Potential applications extend beyond experimental UAVs. The approach can enable smarter collision-avoidance systems for logistics drones, autonomous inspection platforms, environmental monitoring, emergency response, and defense missions—any scenario demanding reliable navigation in unpredictable, dynamic airspace. By demonstrating both learning agility and safety compliance, this research opens the door to broader adoption of reinforcement learning in time-critical autonomous systems across robotics, transportation, and smart infrastructure.

Bibliographic reference:

Pérez-Muñoz, A.G., López-García, G., Quijano, H., & Alonso, A. Development and validation of a safe reinforcement learning drone controller In Jornadas de Automática, 46, Cartagena, Spain, 2025. https://doi.org/10.17979/ja-cea.2025.46.12154

Alejandro Alonso Muñoz: GS / ORCID / LinkedIn

Ángel Grover Pérez Muñoz: GS/ ORCID / LinkedIn

For more information: www.iptc.upm.es

LinkedIn: https://www.linkedin.com/company/iptc-upm/

Share this: