Diff of Good Ideas at ea2e386

@@ -5,2 +5,3 @@
 * Combine new CRL (https://arxiv.org/abs/2408.05804) with offline pretraining.
+* Similarly, contrastive RL for computer algebra (specifically, proving that expressions equal other expressions via making substitutions repeatedly). Try and contrastively learn a "how close is this expression to this other one" function (I think with an action input?). Bootstrap to progressively harder problems.
 * {