IARCS Verification Seminar Series



Title: Specification-Guided Reinforcement Learning

Speaker: Kishor Jothimurugan    (bio) (bio)


Kishor Jothimurugan is an incoming Quantitative Researcher at Two Sigma. He earned his PhD in Computer and Information Science from the University of Pennsylvania, where he was advised by Prof. Rajeev Alur. His research interests lie at the intersection of Formal Methods and Machine Learning. In particular, he is interested in applying formal methods to improve applicability and reliability of reinforcement learning, verifying systems with neural network components and using neurosymbolic approaches to improve program synthesis and analysis.


When: Tuesday, 20 June 2023 at 1900 hrs (IST)   Slides  Video  

Abstract:
Recent advances in Reinforcement Learning (RL) have enabled data-driven controller design for autonomous systems such as robotic arms and self-driving cars. Applying RL to such a system typically involves encoding the objective using a reward function (mapping transitions of the system to real values) and then training a neural network controller (from simulations of the system) to maximize the expected reward. However, many challenges arise when we try to train controllers to perform complex long-horizon tasks---e.g., navigating a car along a complex track with multiple turns. Firstly, it is quite challenging to manually define well-shaped reward functions for such tasks. It is much more natural to use a high-level specification language such as Linear Temporal Logic (LTL) to specify these tasks. Secondly, existing algorithms for learning controllers from logical specifications do not scale well to complex tasks due to a number of reasons including the use of sparse rewards and lack of compositionality. Furthermore, existing algorithms for verifying neural network policies (trained using RL) cannot be easily applied to verify policies for complex long-horizon tasks due to large approximation errors.

In this talk, I will present my work on using logical specifications to specify RL tasks. First, I'll talk about algorithms for learning control policies from such specifications. Then, I'll show how we can use logical task decompositions to scale verification to long-horizons.