
I write about AI/ML/System tech reports
Latest from the Blog
RecSys for Real-time AI Agents
LLMs need a RecSys layer to truly understand users LLMs are powerful general reasoning engines, but they are not optimized to model long-term user preference evolution. Traditional RecSys systems solved this decades ago by learning persistent user representations from interaction timelines. The key idea is simple: treat a user’s history as a structured signal, not…
Scaling Reinforcement Learning with Verifiable Reward (RLVR)
The Basics Post-training – Scaling Test Time Compute The biggest innovation from GPT-o1 is that it proves test time scaling is another dimension besides scaling data and model parameters. Both RL and best-of-n therefore share a common structure, differing only in when the optimization is paid—RL pays the cost during training, best-of-n pays it during…
LLM-RL Fine-Tuning – Math Collections
A Humble Attempt to establish a systematic, theoretical understanding of LLM RL Fine-tuning. This is an initial effort to summarize how traditional RL loss formulations transition into those used in LLMs, note this is an ongoing list, and I plan to gradually enrich it with more equations as the framework becomes more mature.