Diff of Autogollark at 1fee9e3

@@ -24,3 +24,3 @@ Autogollark currently comprises the dataset, the search API server and the [[htt
 * Synthetic via instruct model.
-* {RL (also include reasoning, of course). Probably hard though (sparse rewards). https://arxiv.org/abs/2403.09629. [[https://arxiv.org/abs/2503.22828]] would probably work. [[https://arxiv.org/abs/2505.15778]] [[https://arxiv.org/abs/2505.24864]]
+* {RL (also include reasoning, of course). Probably hard though (sparse rewards). https://arxiv.org/abs/2403.09629. [[https://arxiv.org/abs/2503.22828]] would probably work. [[https://arxiv.org/abs/2505.15778]] [[https://arxiv.org/abs/2505.24864]] [[https://arxiv.org/abs/2509.06160]]
 * Unclear whether model could feasibly learn tool use "from scratch", so still need SFT pipeline.