Hanyang Zhao

Hi! I am Hanyang, a third year Ph.D. candidate at the Department of IEOR at Columbia University. I am fortunate to be advised by Professor Wenpin Tang and Professor David D. Yao. Prior to pursuing my Ph.D., I obtained my B.S. degree in Mathematics at Fudan University and my M.S. degree in Financial Engineering also at Columbia.
My research focuses on reinforcement learning (RL) and generative models (LLMs and Diffusion Model), from both theoretical and practical aspects. Recently, I am researching on discrete (space) diffusion models and scalable methods for improving post-training stages of generative models.
News
Mar 06, 2025 | A short version of Scores as Actions is accepted by DeLTa Workshop at ICLR 2025! |
---|---|
Feb 05, 2025 | Our preference learning survey paper is accepted by JAIR! |
Feb 04, 2025 | We propose a continuous-time RL method for Diffusion Models RLHF, which outperforms discrete-time RL baseline in robustness and stability, and also adapts to diffusion models with high-order or black-box samplers, thanks to the continuous-time nature. See our paper Scores as Actions! |
Jan 22, 2025 | Two papers in RLHF (MallowsPO and RainbowPO) accepted by ICLR 2025! Big thanks and congrats to all my collaborators! |