@@ -24,3 +24,3 @@ Autogollark currently comprises the dataset, the search API server and the [[htt
* Synthetic via instruct model.
-* {RL (also include reasoning, of course). Probably hard though (sparse rewards). https://arxiv.org/abs/2403.09629. [[https://arxiv.org/abs/2503.22828]] would probably work. [[https://arxiv.org/abs/2505.15778]]
+* {RL (also include reasoning, of course). Probably hard though (sparse rewards). https://arxiv.org/abs/2403.09629. [[https://arxiv.org/abs/2503.22828]] would probably work. [[https://arxiv.org/abs/2505.15778]] [[https://arxiv.org/abs/2505.24864]]
* Unclear whether model could feasibly learn tool use "from scratch", so still need SFT pipeline.