LLM-RL Fine-Tuning – Math Collections

A Humble Attempt to establish a systematic, theoretical understanding of LLM RL Fine-tuning.

This is an initial effort to summarize how traditional RL loss formulations transition into those used in LLMs, note this is an ongoing list, and I plan to gradually enrich it with more equations as the framework becomes more mature.

Leave a comment