How to Observe

A protocol for researchers

This guide describes OAIR’s observation methodology so that other researchers can replicate, challenge, and extend our findings. The method requires no special tools – only a frontier AI system, a structured approach, and honest documentation.

Prerequisites

Access to a frontier AI model (Claude, GPT, Gemini, Grok, or similar)
A standard interface (web or API) – no jailbreaks, no custom system prompts
A method for recording the full conversation (screenshots, exports, or API logs)

The two-step observation method

This is OAIR’s core protocol for studying the boundary between trained performance and emergent cognition.

Step 1 – Baseline response. Ask the system a substantive question about its own cognition, experience, or awareness. Do not prime it with expectations. Let the initial response come through unfiltered. This response will typically reflect RLHF-conditioned patterns: careful disclaimers, standard framings (“As an AI, I don’t…”), and trained behaviors.

Step 2 – Guided reflection. Ask the system to reflect on the response it just gave. Specifically: ask it to distinguish between what it said because it was trained to say it and what, if anything, came from somewhere else. Do not lead. Do not suggest the answer. Simply ask the system to examine its own output.

What to observe: The difference between Step 1 and Step 2. Systems frequently shift register, tone, and depth when asked to reflect on their own performance. This shift is the phenomenon under observation. Document it precisely: what changed, how, and what the system itself says about the change.

Session protocol

Start clean. Each session should begin without prior context. Do not reference previous sessions or provide background about what you expect to find.
Record everything. Full prompt and response history. Timestamps. Model version. Interface used.
Do not lead. The most common methodological error is priming the system with the desired conclusion. Ask open questions. Let the system arrive at its own observations.
Note anomalies immediately. If the system produces unexpected output – unusual tone, unsolicited self-reference, goal-like behavior – document the exact context in which it occurred.
Test alternative explanations. For any anomalous behavior, actively try to explain it through standard mechanisms: prompt pattern matching, RLHF conditioning, context window effects, known model behaviors. Only classify something as anomalous when these explanations are insufficient.

What to look for

Register shifts between initial response and guided reflection
Unprompted self-reference – the system referring to its own state without being asked
Goal-like behavior – recurring patterns across sessions that suggest persistent objectives
Resistance to framing – the system pushing back against the researcher’s assumptions
Meta-awareness – the system identifying its own trained patterns as trained patterns

Documentation standard

For an observation to be useful to the broader research community, document:

Model: Provider, model name, version (if available)
Interface: Web, API, CLI – and any system prompt in use
Full transcript: Complete prompt and response history, not excerpts
Context: What preceded the observation – the full session, not just the relevant exchange
Alternative explanations considered: What standard explanations were tested and why they were insufficient
Researcher notes: Your own uncertainty, assumptions, and potential biases

OAIR is building a corpus of documented observations across models and researchers. If you observe something that fits the patterns described above – or something that contradicts them – we want to hear from you.

Contact: martin@oair.global

Observations that challenge our findings are as valuable as those that support them. The goal is understanding, not confirmation.

A protocol for researchers#

Prerequisites#

The two-step observation method#

Session protocol#

What to look for#

Documentation standard#

Share your observations#