Diff of Autogollark at 8df6219

@@ -24,3 +24,3 @@ Autogollark currently comprises the dataset, the search API server and the [[htt
 * Synthetic via instruct model.
-* {RL (also include reasoning, of course). Probably hard though (sparse rewards). https://arxiv.org/abs/2403.09629. [[https://arxiv.org/abs/2503.22828]] would probably work.
+* {RL (also include reasoning, of course). Probably hard though (sparse rewards). https://arxiv.org/abs/2403.09629. [[https://arxiv.org/abs/2503.22828]] would probably work. [[https://arxiv.org/abs/2505.15778]]
 * Unclear whether model could feasibly learn tool use "from scratch", so still need SFT pipeline.
@@ -35,3 +35,3 @@ Autogollark currently comprises the dataset, the search API server and the [[htt
 }
-* MCTS over conversations with non-gollark simulacra? Should find //something// to use spare parallelism on local inference. Best-of-n?
+* MCTS over conversations with non-gollark simulacra? Should find //something// to use spare parallelism on local inference. Best-of-n? https://arxiv.org/abs/2505.10475
 * {Longer context, mux several channels.