Predictive Technology Lab > Papers > 2002 > Dynamically Identifying Regenerative Cycles in Simulation-Based Optimization Algorithms for Markov Chains

Dynamically Identifying Regenerative Cycles in Simulation-Based Optimization Algorithms for Markov Chains

Table of contents
No headers

Simulation-based algorithms for maximizing the average reward of a parameterized Markov chain often rely upon the existence of a state which is recurrent for all choices of parameter values. For example, in the "batch" simulation-based algorithm of Marbach and Tsitsiklis [28], a given recurrent state i* is used to mark the onset of regenerative cycles within a simulation of the process, and the data collected in each cycle give rise to asymptotically unbiased estimates of the gradient of the average reward of the process. The question of which recurrent state should serve as i� is a very important practical consideration in applications. Even when all states of the process are recurrent, some states tend to be visited more often than others, and lengthy renewal cycles tend to result in high variance estimates of the gradient. An appropriate choice of i* is especially dif�cult when the steady state distribution of the process depends strongly on the parameters of the underlying Markov chain. To address this difficulty, we analyze a recently-proposed mechanism for adjusting i* dynamically ( i*-adaptation [8]) as applied to the "batch" simulation-based algorithm of [28]. We show that the desirable convergence properties of the original algorithm are retained with i�-adaptation, namely the almost sure convergence of the parameter vector to a critical point, and we present an academic example which illustrates that i�-adaptation may signi�cantly expand the range of applications of the original methodology.

Files 1

FileSizeDateAttached by 
 DynamicIdentify.pdf
No description
312.91 kB07:56, 11 Jun 2008AdminActions
You must login to post a comment.