Reinforcement Learning

Fixed Wing Platform – System ID & Reinforcement Learning

Image of Fixed-wing2

This project involved system identification and reinforcement learning for control of autonomous fixed wing vehicles.

Adversarial & Navigation Informed Guidance with RL

These are two examples of reinforcement learning applications for drones. On the left black agents try to reach a target in green while red agents try to intercept. On the right, agents learn to take a path that minimizes localization uncertainty.

Simulation vs Reality

Image of simulation-1
Image of simulation-2

This project studied the sim2real gap in reinforcement learning. The plots on the left show simulated drone flight paths. The plots on the right show the same flight paths flown in the physical world, notice they’re messier.

Linear System Identification

Image of lsi1
Image of lsi2

One approach to overcoming the sim2real gap is linear system identification, where flight data is used to fit a simplified model of drone dynamics.

Linear Reinforcement Learning

Image of ll1
Image of ll2

This linear approach works (when combined with techniques like domain randomization and integrated error tracking), but limits the overall performance of the agent.

Nonlinear CARRL: Control with Adaptive Robust RL

Image of nonlinear

A more experimental yet potentially more performant system is CARRL: Control with Adaptive Robust Reinforcement Learning. The recurrent architecture is shown on the left, and simulated adaptive flights are shown on the right. Every time color changes, so do aerodynamic and mass properties, yet the RL system is able to adapt and stabilize. This system has been shown to outperform robust PID, MRAC, sliding mode control and trajectory optimization in simulation.

CARRL on Hardware

Supervised Neural Controller

RL Neural Controller

Image of CARRL2

Achieving CARRL on hardware is still a work in progress. Top left shows a custom in house drone, optimized for neural control, flying an imitation learned controller. Bottom left shows an RL controller failing to stabilize due to the sim2real gap. Right shows promising progress toward hybrid (neural + physics based) approaches to nonlinear sys ID.