On Using Extended Statistical Queries to Avoid Membership Queries

Bshouty, Nader H.; Feldman, Vitaly

On Using Extended Statistical Queries to Avoid Membership Queries

Nader H. Bshouty, Vitaly Feldman; 2(Feb):359-395, 2002.

Abstract

The Kushilevitz-Mansour (KM) algorithm is an algorithm that finds all the "large" Fourier coefficients of a Boolean function. It is the main tool for learning decision trees and DNF expressions in the PAC model with respect to the uniform distribution. The algorithm requires access to the membership query (MQ) oracle. The access is often unavailable in learning applications and thus the KM algorithm cannot be used. We significantly weaken this requirement by producing an analogue of the KM algorithm that uses extended statistical queries (SQ) (SQs in which the expectation is taken with respect to a distribution given by a learning algorithm). We restrict a set of distributions that a learning algorithm may use for its statistical queries to be a set of product distributions with each bit being 1 with probability rho, 1/2 or 1-rho for a constant 1/2 > rho > 0 (we denote the resulting model by SQ-D_rho). Our analogue finds all the "large" Fourier coefficients of degree lower than clog(n) (we call it the Bounded Sieve (BS)). We use BS to learn decision trees and by adapting Freund's boosting technique we give an algorithm that learns DNF in SQ-D_rho. An important property of the model is that its algorithms can be simulated by MQs with persistent noise. With some modifications BS can also be simulated by MQs with product attribute noise (i.e., for a query x oracle changes every bit of x with some constant probability and calculates the value of the target function at the resulting point) and classification noise. This implies learnability of decision trees and weak learnability of DNF with this non-trivial noise. In the second part of this paper we develop a characterization for learnability with these extended statistical queries. We show that our characterization when applied to SQ-D_rho is tight in terms of learning parity functions. We extend the result given by Blum et al. by proving that there is a class learnable in the PAC model with random classification noise and not learnable in SQ-D_rho.

[abs] [pdf] [ps.gz] [ps]