# Tight sample complexity bounds via dualizing LeCam's method

In this talk we consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator. We extend the method to certain symmetric functionals of high-dimensional parametric models.

Next, we apply this general framework to two problems: population recovery and predicting the number of unseen species. In population recovery, the goal is to estimate an unknown high-dimensional distribution (in $L_\infty$-distance) from noisy samples. In the case of \textit{erasure} noise, i.e. when each coordinate is erased with probability $\epsilon$, we discover a curious phase transition in sample complexity at $\epsilon=1/2$. In the second (classical) problem, we observe $n$ iid samples from an unknown distribution on a countable alphabet and the goal is to predict the number of new species that will be observed in the next (unseen) $tn$ samples. Again, we discover a phase transition at $t=1$. In both cases, the complete characterization of sample complexity relies on complex-analytic methods, such as Hadamard's three-lines theorem.

Joint work with Yihong Wu (Yale).

*Yury Polyanskiy is an Associate Professor of Electrical Engineering and Computer Science and a member of LIDS at MIT.* *Yury received M.S. degree in applied mathematics and physics from the Moscow Institute of Physics and Technology, Moscow, Russia in 2005 and Ph.D. degree in electrical engineering from Princeton University, Princeton, NJ in 2010. Currently, his research focuses on basic questions in information theory, error-correcting codes, wireless communication and fault-tolerant and defect-tolerant circuits. Dr. Polyanskiy won the 2013 NSF CAREER award and 2011 IEEE Information Theory Society Paper Award.*