Hanyang Zhao

I am a fourth(final) year Ph.D. candidate at the Department of IEOR of Columbia University advised by Professor Wenpin Tang and Professor David D. Yao. Prior to my Ph.D. study, I obtained my B.S. degree in Mathematics at Fudan University and my M.S. degree in Financial Engineering also at Columbia. I also spent two wonderful summer doing research internships at Netflix (2025) and Capital One (2024).

My research has mainly focused on reinforcement learning (RL) and generative models (both LLMs and diffusion model). I try to enhance the design space of algorithms from first mathematical principles and by leveraging the structural properties of the underlying models. Some of my prior works on both theoretical and practical sides include:

(1) Efficient and effective off-policy RL for enhancing dLLMs reasoning [DiFFPO];
(2) RL in continuous time and space theory for policy optimization [CT-PPO] and applications on fine tuning diffusion models [Score as Action];
(3) Generalized preference modeling/optimization using Mallows-ranking model beyond Bradley-Terry [MallowsPO];
(4) Unified perspective on the design space of offline RLHF algorithms [RainbowPO].
(5) Noise schedule design and convergence analysis of diffusion models: [CDPM],[Tutorials].

I am now on the industrial job market! I am looking for researcher opportunities working on AI. If you think that I am a potential good fit, please email me at hz2684 AT columbia DOT edu.

News

Oct 17, 2025	Delighted to be selected as the NeurIPS 2025 Top Reviewer!
Oct 01, 2025	Our paper, Diffusion Fast and Furious Policy Optimization (DiFFPO), an efficient and performant RL algorithm for dLLM post training is now available on arxiv! We use off-policy RL and sampler joint training to naturally incentive better(furious) and faster reasoning capability of dLLMs. (Yes, I am indeed a big fan of Fast&Furious series…)
Sep 23, 2025	A short version of our dLLM post training paper is accepted by NeurIPS 2025 Efficient Reasoning workshop.