Scattering Invariants for Audio Classification

Joakim Anden - Princeton University
Apr 16 2014 - 3:00pm
Event type: 
110 Fine Hall

Representations for classification tasks reduce the amount of training data by incorporating invariance to transformations that do not affect class membership, such as time-shifting and time-warping in audio. The scattering transform, a cascade of wavelet transforms and modulus operators, satisfies these conditions while capturing discriminative temporal structure and has similarities to traditional audio representations. Unfortunately, the transform is unsuited to capturing joint time-frequency structure, limiting its discriminative power. To remedy this, the joint time-frequency scattering transform is introduced, replacing one-dimensional with two-dimensional wavelet decompositions in the scattering cascade. Using these representations, state-of-the-art results are obtained on phone segment classification and musical genre recognition