Many transform coders use a type of nonlinear approximation that selects all coefficients with magnitudes above a threshold, encodes their positions, and then quantizes their values. This kind of two-stage scheme is very effective at low rates. We derive an upper bound on the operational rate distortion function of such threshold-based nonlinear approximations. The bound is applied to the spike process, which is a generic model for sparse transform coefficients, and to a Gaussian mixture process that is a better model for wavelet coefficients in image transform codes. The results exhibit the same change in distortion decay between low and high rates which is typical for such image coders.