Almost all modern LLMs map relatively low-dimensional hidden states to high-dimensional probability distributions over tokens using a single matrix and a softmax operation. The rank of this transformation is limited to the hidden size, so not all valid probability distributions can be represented. This has a number of consequences.
References: