Reinforcement Learning for Parameterized Quantum State Preparation: A Comparative Study
Gerhard Stenzel, Michael Kölle, Tobias Rohe, Julian Hager, Leo Sünkel, Maximilian Zorn and Claudia Linnhoff-Popien
Abstract: We extend directed quantum circuit synthesis (DQCS) with reinforcement learning from purely discrete gate selection to parameterized quantum state preparation with continuous single-qubit rotations Rx, Ry, and Rz. We compare two training regimes: a one-stage agent that jointly selects the gate type, the affected qubit(s), and the rotation angle; and a two-stage variant that first proposes a discrete circuit and subsequently optimizes the rotation angles with Adam using parameter-shift gradients. Using Gymnasium and PennyLane, we evaluate Proximal Policy Optimization (PPO) and Advantage Actor–Critic (A2C) on systems comprising two to ten qubits and on targets of increasing complexity with λ ranging from one to five. Whereas A2C does not learn effective policies in this setting, PPO succeeds under stable hyperparameters (one-stage: learning rate approximately 5 × 10-4 with a self-fidelity-error threshold of 0.01; two-stage: learning rate approximately 10-4). Both approaches reliably reconstruct computational basis states (between 83\% and 99\% success) and Bell states (between 61\% and 77\% success). However, scalability saturates for λ of approximately three to four and does not extend to ten-qubit targets even at λ = 2. The two-stage method offers only marginal accuracy gains while requiring around three times the runtime. For practicality under a fixed compute budget, we therefore recommend the one-stage PPO policy and outline avenues to improve scalability.
Proceedings of the 17th International Conference on Agents and Artificial Intelligence: ICAART (2026)
Citation:
Gerhard Stenzel, Michael Kölle, Tobias Rohe, Julian Hager, Leo Sünkel, Maximilian Zorn, Claudia Linnhoff-Popien. “Reinforcement Learning for Parameterized Quantum State Preparation: A Comparative Study”. Proceedings of the 17th International Conference on Agents and Artificial Intelligence: ICAART 2026. To appear.
Bibtex:

