news

Mar 06, 2025 A short version of Scores as Actions is accepted by DeLTa Workshop at ICLR 2025!
Feb 05, 2025 Our preference learning survey paper is accepted by JAIR!
Feb 04, 2025 We propose a continuous-time RL method for Diffusion Models RLHF, which outperforms discrete-time RL baseline in robustness and stability, and also adapts to diffusion models with high-order or black-box samplers, thanks to the continuous-time nature. See our paper Scores as Actions!
Jan 22, 2025 Two papers in RLHF (MallowsPO and RainbowPO) accepted by ICLR 2025! Big thanks and congrats to all my collaborators!
Jan 20, 2025 Will join Netflix as a ML Research Intern this coming summer!
Nov 25, 2024 Received a travel grant from NeurIPS 2024 for free registration!
Nov 03, 2024 We wrote an extensive survey paper summarizing the recent progress in preference tuning techniques. See 2nd version here and github repo; any comments are welcome! Please email me if we miss any related references.
Oct 09, 2024 A short version of MallowsPO paper is accepted by Pluralistic Alignment Workshop at NeurIPS 2024.
Oct 05, 2024 Had a great summer at Capital One! My internship project, RainbowPO, is also now available on arxiv!
Jun 03, 2024 I start my internship as an Applied Research PhD Intern at Capital One, working on LLM alignment (RLHF and DPO).
May 23, 2024 One paper in LLM alignment available on arxiv, check our Mallows-DPO paper here.
Jan 27, 2024 One paper in Diffusion Models available on arxiv, check our Contractive Diffusion Probabilistic Models here.
Sep 15, 2023 One paper in Continuous-time RL accepted by NeurIPS 2023, check the paper here.
Sep 10, 2022 I start to pursue my PhD in Operations Research at Columbia IEOR!