Query and Chat Runtime

What this page is for

Use this page as the public contract for Context’s managed query runtime. Both of Context’s public Query surfaces route through the same runtime:

/api/v1/query — the SDK / API surface for programmatic callers
/api/chat — the first-party conversational surface

You should expect identical answer behavior across both. Transport, streaming, and session state differ by surface; the managed runtime that produces the answer does not.

What the runtime guarantees

Grounded answers

Every paid response is assembled from tool results the runtime actually retrieved. If the marketplace can’t answer, the runtime returns capability_miss instead of fabricating numbers.

Honest ambiguity

If a prompt has multiple materially different interpretations, the runtime either returns clarification_required with options or (in auto mode) proceeds with the best option and discloses the assumption it made.

Full collection preservation

Comparison queries return the complete relevant set rather than a cherry-picked subset, so your model can reason over the whole collection.

Per-response settlement

Paid responses settle in USDC on Base after the answer is delivered. Failed executions do not trigger a charge.

Public controls

Both surfaces expose the same underlying runtime through different wire shapes. The public controls that actually affect answer behavior are:

Control	Surface	Description
`query`	`/api/v1/query`	Natural-language question to answer
`tools`	`/api/v1/query`	Optional pinned tool shortlist (overrides marketplace discovery)
`responseShape`	`/api/v1/query`	`answer`, `answer_with_evidence`, or `evidence_only`
`clarificationPolicy`	`/api/v1/query`, `/api/chat`	`return` (surface clarifications to the caller) or `auto` (server picks the best option and discloses the assumption)
`answerModelId`	`/api/v1/query`	Public model knob for the final synthesis step
`selectedChatModel`	`/api/chat`	Same role as `answerModelId`, on the chat wire format
`includeDeveloperTrace`	`/api/v1/query`	Opt in to structured diagnostics for your own debugging

All other runtime behavior — discovery, selection, ambiguity resolution, execution, retrieval, and post-execution computation via the managed Python code_interpreter (see Code Interpreter) — is managed by the server. Callers do not pick between internal lanes or budgets.

Response shapes

Shape	Who it’s for	Contents
`answer`	Backward-compatible callers	A single natural-language answer string
`answer_with_evidence`	First-party chat, agents that want both	Natural-language answer plus structured evidence, freshness, confidence, and optional view payloads
`evidence_only`	External agents that don’t need prose	Structured evidence package without a synthesized answer

When the runtime returns capability_miss or clarification_required, the response still carries typed fields your client can branch on — no prose-parsing required.

How the two surfaces differ

The runtime contract is identical across surfaces. What differs:

Property	`/api/v1/query`	`/api/chat`
Transport	JSON response	Streamed UI events
Session state	Caller-managed	Server-managed chat thread
Auth	API key	User wallet session
Clarification default	`return`	Configurable toggle: off → `return`, on → `auto`

If an answer renders correctly through /api/v1/query but breaks in chat, the chat adapter is usually the first place to look — the shared runtime is the same on both sides.

Debugging with developer traces

Pass includeDeveloperTrace: true to /api/v1/query to receive a structured diagnostics payload alongside the answer. Traces are the supported public way to inspect what the runtime did on a given request. The trace is intentionally a typed, stable surface — internal intermediate diagnostics can change across rollouts without breaking it.

Chat streaming UI extensions (first-party `/chat`)

Beyond message text, the chat stream emits structured UI payloads used by the web client:

Data part	Purpose
`data-reasoningStep`	Incremental librarian reasoning chunks, optionally keyed with `relatedExecutionCallId` to align with tool execution.
`data-executionToolCalls`	Final per-call timeline (arguments + truncated results + `executionCallId`) for the docked execution sidecar.
`data-chartDelta`, `data-codeDelta`, `data-sheetDelta`, `data-textDelta`, `data-imageDelta`	Artifact document streaming payloads; buffered client-side next to structural `data-id`/`data-kind`/`data-finish` parts.

Users can wipe all server-side conversations with DELETE /api/history (authenticated).

Overview

Why Context

Build a Tool

Use the SDK

Earn Grants

Architecture

What this page is for

What the runtime guarantees

Grounded answers

Honest ambiguity

Full collection preservation

Per-response settlement

Public controls

Response shapes

How the two surfaces differ

Debugging with developer traces

Chat streaming UI extensions (first-party `/chat`)

Overview

Why Context

Build a Tool

Use the SDK

Earn Grants

Architecture

Documentation Index

​What this page is for

​What the runtime guarantees

Grounded answers

Honest ambiguity

Full collection preservation

Per-response settlement

​Public controls

​Response shapes

​How the two surfaces differ

​Debugging with developer traces

​Chat streaming UI extensions (first-party /chat)

​Related docs

What this page is for

What the runtime guarantees

Public controls

Response shapes

How the two surfaces differ

Debugging with developer traces

Chat streaming UI extensions (first-party `/chat`)

Related docs