A Humble Attempt to establish a systematic, theoretical understanding of LLM RL Fine-tuning.
This is an initial effort to summarize how traditional RL loss formulations transition into those used in LLMs, note this is an ongoing list, and I plan to gradually enrich it with more equations as the framework becomes more mature.