GSRM: Generative Speech Reward Model for Speech RLHF

Abstract

This research addresses limitations in speech naturalness evaluation by introducing a reasoning-based reward model. The system decomposes evaluations into an interpretable acoustic feature extraction stage followed by feature-grounded chain-of-thought reasoning. The authors developed their approach using 31,000 expert ratings and tested it against real-world speech interactions. Results indicate the model achieves performance metrics approaching human consistency levels and improves speech generation quality when applied to reinforcement learning from human feedback processes.

Publication
arXiv preprint