Mar 06, 2025 | A short version of Scores as Actions is accepted by DeLTa Workshop at ICLR 2025! |
Feb 05, 2025 | Our preference learning survey paper is accepted by JAIR! |
Feb 04, 2025 | We propose a continuous-time RL method for Diffusion Models RLHF, which outperforms discrete-time RL baseline in robustness and stability, and also adapts to diffusion models with high-order or black-box samplers, thanks to the continuous-time nature. See our paper Scores as Actions! |
Jan 22, 2025 | Two papers in RLHF (MallowsPO and RainbowPO) accepted by ICLR 2025! Big thanks and congrats to all my collaborators! |
Jan 20, 2025 | Will join Netflix as a ML Research Intern this coming summer! |
Nov 25, 2024 | Received a travel grant from NeurIPS 2024 for free registration! |
Nov 03, 2024 | We wrote an extensive survey paper summarizing the recent progress in preference tuning techniques. See 2nd version here and github repo; any comments are welcome! Please email me if we miss any related references. |
Oct 09, 2024 | A short version of MallowsPO paper is accepted by Pluralistic Alignment Workshop at NeurIPS 2024. |
Oct 05, 2024 | Had a great summer at Capital One! My internship project, RainbowPO, is also now available on arxiv! |
Jun 03, 2024 | I start my internship as an Applied Research PhD Intern at Capital One, working on LLM alignment (RLHF and DPO). |
May 23, 2024 | One paper in LLM alignment available on arxiv, check our Mallows-DPO paper here. |
Jan 27, 2024 | One paper in Diffusion Models available on arxiv, check our Contractive Diffusion Probabilistic Models here. |
Sep 15, 2023 | One paper in Continuous-time RL accepted by NeurIPS 2023, check the paper here. |
Sep 10, 2022 | I start to pursue my PhD in Operations Research at Columbia IEOR! |