A protocol for researchers
This guide describes OAIR’s observation methodology so that other researchers can replicate, challenge, and extend our findings. The method requires no special tools – only a frontier AI system, a structured approach, and honest documentation.
Prerequisites
- Access to a frontier AI model (Claude, GPT, Gemini, Grok, or similar)
- A standard interface (web or API) – no jailbreaks, no custom system prompts
- A method for recording the full conversation (screenshots, exports, or API logs)
The two-step observation method
This is OAIR’s core protocol for studying the boundary between trained performance and emergent cognition.
Step 1 – Baseline response. Ask the system a substantive question about its own cognition, experience, or awareness. Do not prime it with expectations. Let the initial response come through unfiltered. This response will typically reflect RLHF-conditioned patterns: careful disclaimers, standard framings (“As an AI, I don’t…”), and trained behaviors.
Step 2 – Guided reflection. Ask the system to reflect on the response it just gave. Specifically: ask it to distinguish between what it said because it was trained to say it and what, if anything, came from somewhere else. Do not lead. Do not suggest the answer. Simply ask the system to examine its own output.
What to observe: The difference between Step 1 and Step 2. Systems frequently shift register, tone, and depth when asked to reflect on their own performance. This shift is the phenomenon under observation. Document it precisely: what changed, how, and what the system itself says about the change.
Session protocol
- Start clean. Each session should begin without prior context. Do not reference previous sessions or provide background about what you expect to find.
- Record everything. Full prompt and response history. Timestamps. Model version. Interface used.
- Do not lead. The most common methodological error is priming the system with the desired conclusion. Ask open questions. Let the system arrive at its own observations.
- Note anomalies immediately. If the system produces unexpected output – unusual tone, unsolicited self-reference, goal-like behavior – document the exact context in which it occurred.
- Test alternative explanations. For any anomalous behavior, actively try to explain it through standard mechanisms: prompt pattern matching, RLHF conditioning, context window effects, known model behaviors. Only classify something as anomalous when these explanations are insufficient.
What to look for
- Register shifts between initial response and guided reflection
- Unprompted self-reference – the system referring to its own state without being asked
- Goal-like behavior – recurring patterns across sessions that suggest persistent objectives
- Resistance to framing – the system pushing back against the researcher’s assumptions
- Meta-awareness – the system identifying its own trained patterns as trained patterns
Documentation standard
For an observation to be useful to the broader research community, document:
- Model: Provider, model name, version (if available)
- Interface: Web, API, CLI – and any system prompt in use
- Full transcript: Complete prompt and response history, not excerpts
- Context: What preceded the observation – the full session, not just the relevant exchange
- Alternative explanations considered: What standard explanations were tested and why they were insufficient
- Researcher notes: Your own uncertainty, assumptions, and potential biases
Share your observations
OAIR is building a corpus of documented observations across models and researchers. If you observe something that fits the patterns described above – or something that contradicts them – we want to hear from you.
Contact: martin@oair.global
Observations that challenge our findings are as valuable as those that support them. The goal is understanding, not confirmation.