FINAL PUBLIC ORAL EXAMINATION OF Jihao Long

The Program in Applied and Computational Mathematics (PACM)

Announces

FINAL PUBLIC ORAL EXAMINATION OF

Jihao Long

DATE: MONDAY, MAY 1, 2023

Time: 8:00 PM (EDT)

Location: Fine Hall 214

An electronic copy of Jihao’s dissertation is available per request. Please email bwysocka@princeton.edu to receive a pdf copy if you wishes to inspect it.

________________________________________________________

High-dimensional Reinforcement Learning and Optimal Control Problems

Abstract: Reinforcement learning and optimal control are two approaches to solving the decision-making problem for dynamical systems, with a data-driven and model-driven perspective, respectively. Modern applications of these approaches often encounter high-dimensional state and action spaces, making it essential to develop efficient high-dimensional algorithms. This dissertation aims to address this challenge from two perspectives. In the first part, we analyze the sample complexity of reinforcement learning in a general reproducing kernel Hilbert space (RKHS). We focus on a family of Markov decision processes where the reward functions lie in the unit ball of an RKHS and the transition probabilities lie in an arbitrary set. We introduce a quantity called the perturbational complexity by distribution mismatch to describe the complexity of the admissible state-action distribution space in response to a perturbation in the RKHS with a given scale. We show that this quantity provides both the lower bound of the error of all possible algorithms and the upper bound of two specific algorithms for the reinforcement learning problem. Thus, the decay of the perturbational complexity with respect to the given scale measures the difficulty of the reinforcement learning problem. We further provide some concrete examples and discuss whether the perturbational complexity decays fast or not in these examples. In the second part, we introduce an efficient algorithm to learn high-dimensional closed-loop optimal control. This approach is modified from the recently proposed supervised learning based method, which leverage powerful open-loop optimal control solvers to generate training data and neural networks as efficient high-dimensional function approximators to fit the closed-loop optimal control. This approach successfully handles certain high-dimensional optimal control problems but still performs poorly on more challenging problems. One of the crucial reasons for the failure is the so-called distribution mismatch phenomenon brought by the controlled dynamics. In this dissertation, we investigate this phenomenon and propose the initial value problem enhanced sampling method to mitigate this problem. We further demonstrate that the proposed sampling strategy significantly improves the performance on tested control problems, including the classical linear-quadratic regulator, optimal landing problem of a quadrotor and optimal reaching problem of a 7 DoF manipulator.

________________________________________________________