Euclidean Distance Deflation under High-Dimensional Heteroskedastic Noise

IDeAS
Apr 30, 2026
1:30 - 2:30 pm
224 FINE HALL

Abstract

Pairwise Euclidean distances are a basic ingredient in many machine learning and data analysis methods. In many applications, however, these distances are distorted by heteroskedastic noise, where different observations are corrupted at different noise levels. This can substantially distort the geometry of the data and complicate downstream tasks that rely on accurate distance information. In this talk, I will discuss the problem of recovering meaningful pairwise distances under high-dimensional heteroskedastic noise. I will describe a principled approach for estimating observation-specific noise levels and correcting the distorted distances with theoretical guarantees, without prior knowledge of the underlying clean data structure or noise distribution. I will also highlight simulations and experiments with real single-cell RNA sequencing data that illustrate the effectiveness of our approach. This is joint work with Keyi Li and Yuval Kluger.