Diff of Compression Is Prediction Is Intelligence at 68880ca

@@ -0,0 +1 @@
+Any model which assigns probabilities to sequences can be turned into a compression algorithm (e.g. with [[arithmetic coding]]). Symmetrically, any compression algorithm can be interpreted as assigning probabilities to sequences (splitting probability mass of 1/2, 1/4, 1/8, ... between all sequences of length 0, 1, 2, etc). [[Intelligence]] is (mostly) predicting what will happen next in the world. This is the thinking behind the [[http://prize.hutter1.net/|Hutter Prize]] and [[large language model|large language models]] (the [[cross-entropy loss]] used in pretraining is exactly identical to providing optimal (tokenwise) compression).
\ No newline at end of file