Logical reasoning and Transformers

Joint PACM & Center for Statistics and Machine Learning Colloquium.

Abstract: Transformers have become the dominant neural network architecture in deep learning, in particular with the GPT language models. While they dominate in language and vision tasks, their performance is less convincing in so-called “reasoning” tasks.

In this talk, we introduce the “generalization on the unseen (GOTU)" objective to test the reasoning capabilities of neural network architectures, primarily Transformers on Boolean/logic tasks. We first give experimental results showing that such networks have a strong “minimal degree bias": they tend to find specific interpolators with lowest degree. Using basic concepts from Boolean Fourier analysis and algebraic geometry, we characterize such minimal degree profile interpolators and prove two theorems about the convergence of (S)GD to such functions on basic architectures. Since the minimal degree profile is not desirable in many reasoning tasks, we discuss various methods to correct this bias and improve consequently the reasoning capabilities of architectures. Joint work with S. Bengio, A. Lotfi, K. Rizk (ICML23 paper award)

Prof. Emmanuel Abbe - Bio

Emmanuel Abbe be received his Ph.D. degree from the EECS Department at the Massachusetts Institute of Technology (MIT) in 2008, and his M.S. degree from the Department of Mathematics at the Ecole Polytechnique Fédérale de Lausanne (EPFL) in 2003. He was at Princeton University as an assistant professor from 2012-2016 and an associate professor from 2016, jointly in the Program for Applied and Computational Mathematics and the Department of Electrical Engineering. He joined EPFL in 2018 as a Full Professor, jointly in the Mathematics Institute and the School of Computer and Communication Sciences, where he holds the Chair of Mathematical Data Science. He is also co-director of the EPFL Bernoulli Center for Fundamental Studies and academic consultant at Apple. He is the recipient/co-recipient of the Foundation Latsis International Prize; the Bell Labs Prize; the von Neumann Fellowship from the Institute for Advanced Study; the IEEE Information Theory Society Paper Award; the Simons-NSF Mathematics of Deep Learning Collaborative Research Award; the ICML Outstanding Paper Award.