Teaching LLMs to Be Funny
Summary
The article documents an experiment to train a trillion-parameter LLM to be funny by applying rubric-based reinforcement learning to the Kimi K2 model. It details decomposing humor into verifiable rubrics, building a data pipeline from social media and humor publications, and iterating with SFT and RL to improve humorous outputs, while noting what didn’t work and what did.