This project involved system identification and reinforcement learning for control of autonomous fixed wing vehicles.
These are two examples of reinforcement learning applications for drones. On the left black agents try to reach a target in green while red agents try to intercept. On the right, agents learn to take a path that minimizes localization uncertainty.
This project studied the sim2real gap in reinforcement learning. The plots on the left show simulated drone flight paths. The plots on the right show the same flight paths flown in the physical world, notice they’re messier.
One approach to overcoming the sim2real gap is linear system identification, where flight data is used to fit a simplified model of drone dynamics.
This linear approach works (when combined with techniques like domain randomization and integrated error tracking), but limits the overall performance of the agent.
A more experimental yet potentially more performant system is CARRL: Control with Adaptive Robust Reinforcement Learning. The recurrent architecture is shown on the left, and simulated adaptive flights are shown on the right. Every time color changes, so do aerodynamic and mass properties, yet the RL system is able to adapt and stabilize. This system has been shown to outperform robust PID, MRAC, sliding mode control and trajectory optimization in simulation.
Achieving CARRL on hardware is still a work in progress. Top left shows a custom in house drone, optimized for neural control, flying an imitation learned controller. Bottom left shows an RL controller failing to stabilize due to the sim2real gap. Right shows promising progress toward hybrid (neural + physics based) approaches to nonlinear sys ID.