On Using Extended Statistical Queries to Avoid Membership Queries
Nader H. Bshouty, Vitaly Feldman;
2(Feb):359-395, 2002.
Abstract
The Kushilevitz-Mansour (KM) algorithm is an algorithm that finds all the
"large" Fourier coefficients of a Boolean function. It is the main tool for
learning decision trees and DNF expressions in the PAC model with respect to
the uniform distribution. The algorithm requires access to the membership
query (MQ) oracle. The access is often unavailable in learning applications
and thus the KM algorithm cannot be used. We significantly weaken this
requirement by producing an analogue of the KM algorithm that uses extended
statistical queries (SQ) (SQs in which the expectation is taken with respect
to a distribution given by a learning algorithm). We restrict a set of
distributions that a learning algorithm may use for its statistical queries
to be a set of product distributions with each bit being 1 with probability
rho, 1/2 or 1-
rho for a constant 1/2 >
rho > 0 (we denote the resulting
model by SQ-
Drho). Our analogue finds all the "large" Fourier coefficients
of degree lower than
clog(
n) (we call it the
Bounded Sieve (BS)). We use BS
to learn decision trees and by adapting Freund's boosting technique we give
an algorithm that learns DNF in SQ-
Drho. An important property of the model
is that its algorithms can be simulated by MQs with persistent noise. With
some modifications BS can also be simulated by MQs with product attribute
noise (i.e., for a query
x oracle changes every bit of
x with some constant
probability and calculates the value of the target function at the resulting
point) and classification noise. This implies learnability of decision trees
and weak learnability of DNF with this non-trivial noise. In the second part
of this paper we develop a characterization for learnability with these
extended statistical queries. We show that our characterization when applied
to SQ-
Drho is tight in terms of learning parity functions. We extend the
result given by Blum et al. by proving that there is a class learnable in
the PAC model with random classification noise and not learnable in
SQ-
Drho.
[abs]
[pdf]
[ps.gz]
[ps]