Hanyang Zhao

selfie_hanyang.jpg

Hi! I am Hanyang, a third year Ph.D. candidate at the Department of IEOR at Columbia University. I am fortunate to be advised by Professor Wenpin Tang and Professor David D. Yao. Prior to pursuing my Ph.D., I obtained my B.S. degree in Mathematics at Fudan University and my M.S. degree in Financial Engineering also at Columbia.

My research focuses on reinforcement learning (RL) and generative models (LLMs and Diffusion Model), from both theoretical and practical aspects. Recently, I am researching on discrete (space) diffusion models and scalable methods for improving post-training stages of generative models.

News

Mar 06, 2025 A short version of Scores as Actions is accepted by DeLTa Workshop at ICLR 2025!
Feb 05, 2025 Our preference learning survey paper is accepted by JAIR!
Feb 04, 2025 We propose a continuous-time RL method for Diffusion Models RLHF, which outperforms discrete-time RL baseline in robustness and stability, and also adapts to diffusion models with high-order or black-box samplers, thanks to the continuous-time nature. See our paper Scores as Actions!
Jan 22, 2025 Two papers in RLHF (MallowsPO and RainbowPO) accepted by ICLR 2025! Big thanks and congrats to all my collaborators!

Selected Publications

  1. Diffusion RLHF
    Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
    Hanyang Zhao ,  Haoxian Chen ,  Yucheng Guo , and 5 more authors
    arXiv preprint arXiv:2503.11720, 2025
  2. Diffusion RLHF
    Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
    Hanyang Zhao ,  Haoxian Chen ,  Ji Zhang , and 2 more authors
    arXiv preprint arXiv:2502.01819, 2025
    Short version in DeLTa Workshop at ICLR 2025.
  3. JAIR
    Preference tuning with human feedback on language, speech, and vision tasks: A survey
    Genta Indra Winata* ,  Hanyang Zhao* ,  Anirban Das* , and 4 more authors
    Journal of Artificial Intelligence Research, 2024
  4. ICLR 2025
    RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
    Hanyang Zhao* ,  Genta Indra Winata* ,  Anirban Das* , and 4 more authors
    International Conference on Learning Representations, 2025
  5. ICLR 2025
    MallowsPO: Fine-Tune Your LLM with Preference Dispersions
    Haoxian Chen* ,  Hanyang Zhao* ,  Henry Lam , and 2 more authors
    International Conference on Learning Representations, 2025
    Short version in Pluralistic Alignment Workshop at NeurIPS 2024.
  6. NeurIPS 2023
    Policy optimization for continuous reinforcement learning
    Hanyang Zhao ,  Wenpin Tang ,  and  David Yao
    Advances in Neural Information Processing Systems, 2023