Please note that viewing media is not supported in history for now. Get Mycomarkup source of this revision

G™GPT-4

GPT-4 is a 2022/2023 large language model by OpenAI. Similarly to GPT-3, the model was not released except through an API; dissimilarly, almost no technical details such as architecture, dataset and parameter count were released in the technical report, though credible leaks later provided some of this information.

Architecture

  • 1.8 trillion parameters.

  • 16-way Mixture of Experts with two used per forward pass.

  • 25000 A100 GPUs for ~90 days.

  • ~13T tokens training data (not all unique; run for multiple epochs).

GPT-4-base

Unlike GPT-3, a base model, GPT-4 was only released to the general public in various instruction-tuned forms, for "safety". GPT-4's base model is accessible under NDA. According to the limited information from those who have used it,

Sydney