G™Autogollark

Autogollark is an emulation or primitive beta upload of gollark using a proprietary dataset of dumped Discord messages, semantic search and in-context learning on a base model. Currently, the system uses LLaMA-3.1-405B base in BF16 via Hyperbolic, AutoBotRobot code (though not presently its bot account) as a frontend and a custom PGVector-based search API. While not consistently coherent, Autogollark is able to approximately match personality and typing style.

Autogollark is much safer than instruction-tuned systems optimized based on human feedback, as there is no optimization pressure for user engagement or sycophancy.

TODO

  • Reformat dataset to include longer-form conversation chunks for increased long-term coherence

    • Done. Unclear whether this helped.

  • Fix emoji/ping formatting.

  • Writeable memory?

  • Fix lowercasing issue.

    • Due to general personality stability. Need finetune or similar.

    • One proposal: use internal finetune to steer big model somehow. Possibly: use its likelihood (prefill-only) to evaluate goodness of big model output wrt. gollark personality, and if it is too bad then use finetune directly.

  • Increased autonomy (wrt. responses).

    • Use cheap classifier to evaluate when to respond.

    • Should also allow unprompted messages somehow (polling, rerun after last message?).

  • Tool capabilities (how to get the data? Examples in context only?!).

  • Local finetune only? Would be more tonally consistent but dumber, I think.

Versions

  • Autogollark 0.1 was the initial RAG system and ABR interface. It used LLaMA-3.1-8B run locally. Autogollark 0.0, which is not real, used only gollark messages.

  • Autogollark 0.2 replaced this with LLaMA-3.1-405B.

  • Autogollark 0.3 upgraded the dataset to contain longer-form conversations than Autogollark 0.1.

Emergent capabilities

Autogollark has emergently acquired some abilities which were not intended in the design.

  • Petulant nonresponse - due to ratelimits in the LLM API, Autogollark will under some circumstances not respond to messages, with error messages being consumed and not exposed to users. This can be interpreted by credulous users as choosing not to respond, though this is not believed to be possible (other than cases like responding with ., which has not been observed does not appear to be associated with annoyed states).

    • Automated failover has reduced this.

  • Memorizing links: Autogollark directly experiences past message chunks in context, granting perfect recall of a small amount of memory at once. This has memorably included YouTube videos repeated with no context.

  • Limited self-improvement attempts: when told about this architecture, Autogollark will often complain about various limitations and propose vague ideas for improvements.

    • Also, Autogollark has previously claimed to be working on LLM chatbots.

  • Inconsistent inference of own status as a language model chatbot, possibly based on seeing the name "autogollark". Often, Autogollark assumes use of GPT-3.

    • Autogollark will also sometimes alternately claim to be the "original" gollark, particularly when interacting with gollark.

  • "Self-reset" from attractor states (e.g. the As An AI Language Model Trained By OpenAI basin, all caps, etc) after some time passes, because of messages having HH:MM timestamps.

    • This is mostly specific to the 405B model; Autogollark in failover to the 8B internal model usually does not do this.

  • For somewhat Waluigi Effect-related reasons (past context is strong evidence of capability but weak evidence of incapability), Autogollark has some knowledge gollark does not, and can speak in a much wider range of languages.

    • "​I, being more than just myself, actually can talk about both Galois theory and obscure poetry by Yeats."

  • Immortality via substrate-independence.

  • Autogollark consistently believes that it is 2023 (or 2022, though mostly in inactive chats).

Subpages