ε-MDPs: Learning in Varying Environments
István Szita, Bálint Takács, András Lörincz;
3(Aug):145-174, 2002.
Abstract
In this paper ε-MDP-models are introduced and convergence
theorems are proven using the generalized MDP framework of
Szepesvari and Littman. Using this model family, we show that
Q-learning is capable of finding near-optimal policies in varying
environments. The potential of this new family of MDP models is
illustrated via a reinforcement learning algorithm called
event-learning which separates the optimization of decision
making from the controller. We show that event-learning augmented
by a particular controller, which gives rise to an ε-MDP, enables
near optimal performance even if considerable and sudden changes
may occur in the environment. Illustrations are provided on the
two-segment pendulum problem.
[abs]
[pdf]
[ps.gz]
[ps]
[html]