Hanyang Zhao

selfie_hanyang.jpg

I am a fourth year Ph.D. candidate at the Department of IEOR of Columbia University advised by Professor Wenpin Tang and Professor David D. Yao. Prior to my Ph.D. study, I obtained my B.S. degree in Mathematics at Fudan University and my M.S. degree in Financial Engineering also at Columbia. I also spent two wonderful summer doing research internships at Netflix (2025 summer) and Capital One (2024 summer).

My research has been mainly focus on reinforcement learning (RL) and generative models (both LLMs and diffusion model). I try to enhance the design space of algorithms from first mathematical principles by leveraging the structural properties of the underlying models. Some of my prior works on both theoretical and practical sides include:

(1) Efficient and effective off-policy RL for dLLMs reasoning [DiFFPO];
(2) Continuous Time RL theory [CT-PPO] and applications on diffusion models RLHF [Score as Action];
(3) Generalized preference modeling/optimization using Mallows-ranking model beyond Bradley-Terry [MallowsPO];
(4) Unified perspective on the design space of offline RLHF algorithms: [RainbowPO].
I also did some earlier work on noise schedule design and convergence analysis of diffusion models: [CDPM],[Tutorials].

I am now on the industrial job market! I am looking for researcher opportunities working on AI. If you think that I am a potential good fit, please email me at hz2684 AT columbia DOT edu.

News

Sep 23, 2025 A short version of our dLLM post training paper is accepted by NeurIPS 2025 Efficient Reasoning workshop.
Aug 30, 2025 Spend a great summer internship at Netflix working on post training diffusion LLMs to reason better and faster!
May 01, 2025 Our Scores as Actions paper is accepted by ICML 2025! See everybody in Vancouver this summer!