news | Hanyang Zhao

Oct 17, 2025	Delighted to be selected as the NeurIPS 2025 Top Reviewer!
Oct 01, 2025	Our paper, Diffusion Fast and Furious Policy Optimization (DiFFPO), an efficient and performant RL algorithm for dLLM post training is now available on arxiv! We use off-policy RL and sampler joint training to naturally incentive better(furious) and faster reasoning capability of dLLMs. (Yes, I am indeed a big fan of Fast&Furious series…)
Sep 23, 2025	A short version of our dLLM post training paper is accepted by NeurIPS 2025 Efficient Reasoning workshop.
Aug 30, 2025	Spend a great summer internship at Netflix Research working on post training diffusion LLMs to reason better and faster: see DiFFPO!
May 01, 2025	Our Scores as Actions paper is accepted by ICML 2025! See everybody in Vancouver this summer!
Feb 05, 2025	Our preference learning survey paper is accepted by JAIR!
Feb 04, 2025	We propose a continuous-time RL method for Diffusion Models RLHF, which outperforms discrete-time RL baseline in robustness and stability, and also adapts to diffusion models with high-order or black-box samplers, thanks to the continuous-time nature. See our paper Scores as Actions!
Jan 22, 2025	Two papers in RLHF (MallowsPO and RainbowPO) accepted by ICLR 2025! Big thanks and congrats to all my collaborators!
Jan 20, 2025	Will join Netflix as a ML Research Intern this coming summer!
Nov 25, 2024	Received a travel grant from NeurIPS 2024 for free registration!
Nov 03, 2024	We wrote an extensive survey paper summarizing the recent progress in preference tuning techniques. See 2nd version here and github repo; any comments are welcome! Please email me if we miss any related references.
Oct 09, 2024	A short version of MallowsPO paper is accepted by Pluralistic Alignment Workshop at NeurIPS 2024.
Oct 05, 2024	Had a great summer at Capital One! My internship project, RainbowPO, is also now available on arxiv!
Jun 03, 2024	I start my internship as an Applied Research PhD Intern at Capital One, working on LLM alignment (RLHF and DPO).
May 23, 2024	One paper in LLM alignment available on arxiv, check our Mallows-DPO paper here.
Jan 27, 2024	One paper in Diffusion Models available on arxiv, check our Contractive Diffusion Probabilistic Models here.
Sep 15, 2023	One paper in Continuous-time RL accepted by NeurIPS 2023, check the paper here.
Sep 10, 2022	I start to pursue my PhD in Operations Research at Columbia IEOR!