Joint PACM & Center for Statistics and Machine Learning Colloquium.
Abstract: Transformers have become the dominant neural network architecture in deep learning, in particular with the GPT language models. While they dominate in language and vision tasks, their performance is less convincing in so-called “reasoning” tasks.
In this talk, we introduce the “generalization on the unseen (GOTU)" objective to test the reasoning capabilities of neural network architectures, primarily Transformers on Boolean/logic tasks. We first give experimental results showing that such networks have a strong “minimal degree bias": they tend to find specific interpolators with lowest degree. Using basic concepts from Boolean Fourier analysis and algebraic geometry, we characterize such minimal degree profile interpolators and prove two theorems about the convergence of (S)GD to such functions on basic architectures. Since the minimal degree profile is not desirable in many reasoning tasks, we discuss various methods to correct this bias and improve consequently the reasoning capabilities of architectures. Joint work with S. Bengio, A. Lotfi, K. Rizk (ICML23 paper award)