
I write about AI/ML/System tech reports
Latest from the Blog
Memory-Transformer Research Tree
Carrying on from the last blog: https://shermwong.com/2026/02/22/recsys-for-real-time-ai-agents/ – which is a more high level depiction of possible future states of personalized, real-time AI agents. Here we dig into a deeper topic of how backbone transformer models has evolved so far on memory architectures, which is a key module to enable continuous adaptation of agents. Current…
RecSys for Real-time AI Agents
LLMs need a RecSys layer to truly understand users LLMs are powerful general reasoning engines, but they are not optimized to model long-term user preference evolution. Traditional RecSys systems solved this decades ago by learning persistent user representations from interaction timelines. The key idea is simple: treat a user’s history as a structured signal, not…
Scaling Reinforcement Learning with Verifiable Reward (RLVR)
The Basics Post-training – Scaling Test Time Compute The biggest innovation from GPT-o1 is that it proves test time scaling is another dimension besides scaling data and model parameters. Both RL and best-of-n therefore share a common structure, differing only in when the optimization is paid—RL pays the cost during training, best-of-n pays it during…