![]() ![]() Specifically, we prove that one version of the projective simulation model, understood as a reinforcement learning approach, converges to optimal behavior in a large class of Markov decision processes. In this paper, we provide a detailed formal discussion of the properties of this model. Although classical variants of projective simulation have been benchmarked against common reinforcement learning algorithms, very few formal theoretical analyses have been provided for its performance in standard learning scenarios. Projective simulation presents an agent-based reinforcement learning approach designed in a manner which may support quantum walk-based speed-ups. The first framework in which ways to exploit quantum resources specifically for the broader context of reinforcement learning were found is projective simulation. Many algorithms speeding up supervised and unsupervised learning were established. In recent years, the interest in leveraging quantum effects for enhancing machine learning tasks has significantly increased. This proof shows that a physically inspired approach to reinforcement learning can guarantee to converge. Projective simulation presents an agent-based reinforcement learning approach designed in a manner which may support quantum walk-based speedups. We apply the agent to various setups of an invasion game and a grid world, which serve as elementary model tasks allowing a direct comparison with a basic classical PS agent. In this way, the model combines features of PS with the ability for generalization, offered by its physical embodiment as a quantum system. They are described by an update rule that is inspired by the projective simulation (PS) model and equipped with a glow mechanism that allows for a backpropagation of policy changes, analogous to the eligibility traces in RL and edge glow in PS. The learning takes place via stepwise modifications of the channel properties. Perceptual inputs are encoded as quantum states, which are subsequently transformed by a quantum channel representing the agent's memory, while the outcomes of measurements performed at the channel's output determine the agent's actions. ![]() We consider a general class of models, where a reinforcement learning (RL) agent learns from cyclic interactions with an external environment via classical signals.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |