What This Pattern Is
Some contributor venues expose great primitives but weak native search. In those cases, a contributor may optionally add a contributor-side search helper that turns a broad natural-language request into a bounded candidate set and, when needed, uses a cheap LLM judge to pick the best match. This pattern is intentionally optional:- Context marketplace discovery stays generic.
- Contributor servers stay responsible for venue-specific retrieval inside their own methods.
- Contributors with strong upstream search should usually not adopt this helper at all.
Frozen Product Boundaries
The contract is locked to these boundaries:- Context runtime discovers tools and methods, not venue-specific market/entity candidates.
- The helper runs inside the contributor server or contributor-side SDK helper layer.
- Deterministic candidate gathering, provenance, normalization, and schema validation stay contributor-owned and auditable.
- If the helper surfaces diagnostics to the runtime, they must be standardized trace metadata rather than venue-specific control flow leaking into the marketplace contract.
- The helper does not change
client.query.run(),client.tools.execute(), or the public Query/Execute billing model.
When To Use It
Use this pattern only when all of the following are true:- the contributor wraps a venue where the upstream search surface is weak, noisy, or incomplete
- deterministic-only ranking has already proven insufficient on real prompts
- the contributor still has a deterministic way to fetch, normalize, and validate candidates before and after any model judgment
- the upstream venue already exposes reliable direct search and candidate resolution
- the venue can be solved by a small deterministic wrapper without model judgment
- the contributor cannot return honest degraded outcomes when the judge is slow, malformed, unavailable, or over budget
Contributor Adoption Checklist
Before you enable this helper for a venue, lock all of the following:- the exact named prompt families you are trying to resolve, plus one generic overlap fixture, one still-ambiguous fixture, and one capability-miss fixture
- the strongest public retrieval surface(s) you will hit before any model judgment
- the deterministic constraints that must still hold before and after the judge runs
- the default provider/model, timeout, budget, disable switch, and degraded-outcome policy you will expose to contributors
- how the extra LLM spend will be recovered in your own listing response price and/or execute pricing
- where you will save replayable machine-readable validation artifacts for later agents and rollout reviews
Frozen Helper Flow
The helper contract is locked to this shape:- Distill the request into one or more compact search intents or clause-owned search queries.
- Hit the venue’s strongest public retrieval surface.
- Deterministically normalize, deduplicate, and locally rank the candidate pool.
- Ask a cheap judge model to return strict JSON that names the primary, related, and rejected candidates.
- Deterministically validate the judge output against the actual shortlist.
- Fetch full details only for the shortlisted candidates.
- Return structured answer data plus search provenance, confidence, and any degraded-outcome signal.
Allowed Outcomes
The helper may return only these outcome classes:selected: one validated candidate survived and the contributor can proceed with a bounded, honest confidence signalshortlist_only: the helper could not honestly select one candidate, so it returns a bounded shortlist plus the degradation reasoncapability_miss: no viable candidate can satisfy the requested venue, market family, or capability
- A malformed, contradictory, slow, unavailable, disabled, or over-budget judge must never silently manufacture certainty.
- The helper may return a low-confidence
selectedresult only when one best candidate still survives deterministic validation and that low confidence is surfaced honestly. - If the judge fails and no uniquely grounded candidate remains, the helper must degrade to
shortlist_onlyorcapability_miss.
Provider And Cost Policy
Provider/model behavior is frozen to the following rules:- The helper core accepts an injected judge implementation or equivalent config-driven adapter boundary.
- Provider choice is contributor-configurable; OpenRouter is only the reference adapter for this phase.
- Contributors may override provider, model, timeout, budget, and disable-switch behavior without changing helper call sites.
- Billing is contributor BYOK only in this phase. Any extra model spend belongs to the contributor server.
- The platform does not provide hosted judge routing, deferred judge reimbursement, or platform-funded model fallback in this rollout.
- If a design spike or local validation run exercises a judge, use
glm-turbo-modellocally unless you are intentionally comparing models.
Pricing Responsibility In Practice
- Context continues to bill the normal Query and Execute surfaces only. The helper does not create a new marketplace billing surface.
- Any incremental judge spend comes from the contributor’s own provider credentials, so the contributor should recover it through listing response pricing and, when relevant, execute pricing.
- Keep the judge cheap enough that honest degradation remains viable. A helper that only works when every call burns an oversized model budget is not ready for rollout.
Public Semantics That Must Match Across SDKs
These semantics are now locked across the TypeScript and Python SDKs:- the helper stays contributor-side and optional in both languages
- the same high-level knobs must exist in both languages: provider/model override, timeout, budget, disable switch, and degraded-outcome policy
- the same result concepts must exist in both languages: shortlisted candidates, selected candidate, related/rejected candidates, confidence, provenance, and degraded-outcome signals
- the same compact provenance envelope must exist in both languages:
searchMetadatawithselectedCandidateId, shortlist/related/rejected candidate ids, candidate counts, judge snapshot, provenance sources, and validator trace summary - the same machine-readable validation artifact shape must exist in both languages: top-level case metadata plus a nested
resolutionblock withselectedCandidateId, shortlist/related/rejected ids, confidence, reason, anddegradedReasonCode - developer traces must expose aggregated contributor search provenance at
QueryDeveloperTraceDiagnostics.contributorSearches - a semantic mismatch between the two SDKs is a contract bug, not an acceptable language difference
TypeScript And Python Adoption Surface
Both SDKs now ship contributor-focused helper modules:- TypeScript can stay camelCase and Python can stay snake_case at the API surface, but the JSON-facing
searchMetadata, validation artifact fields, and degraded reason codes must remain semantically identical. - Provider/model overrides, timeout, budget, disable switch behavior, and degraded-outcome policy must merge in the same order in both SDKs: helper defaults -> contributor defaults -> per-call overrides.
- Runtime developer traces expose aggregated contributor helper provenance at
QueryDeveloperTraceDiagnostics.contributorSearches; that is the runtime visibility contract, not a venue-specific_metafield.
Packaging Decision
The core helper exports now ship at these SDK-owned paths:- TypeScript core helper surface:
@ctxprotocol/sdk/contrib/search - Python core helper surface:
ctxprotocol.contrib.search - the core helper surface lives inside each SDK as a contributor-focused module rather than as a new marketplace protocol requirement
- provider-specific adapters will sit alongside that core helper surface rather than inside the marketplace runtime
- OpenRouter remains the first planned reference adapter, but the contract must remain provider-agnostic
Degraded Outcomes In Practice
Use degraded outcomes to stay honest, not as a hidden retry loop:selected: one deterministically valid best candidate survives. This may still carry low confidence, but only when that low confidence is surfaced explicitly.shortlist_only: grounded candidates remain, but the helper cannot honestly pick one winner. Common reason codes includejudge_disabled,judge_timeout,judge_budget_exceeded,judge_invalid_output, andvalidator_rejected.capability_miss: no viable candidate satisfies the requested venue, market family, or capability.
resolution.degradedReasonCodesearchMetadata.degradedsearchMetadata.judgesearchMetadata.trace
Saved Validation Evidence
Use the checked-in artifact directories as the current replayable evidence base:- TypeScript Polymarket pilot:
context-sdk/examples/server/polymarket-contributor/validation/ - TypeScript Kalshi pilot:
context-sdk/examples/server/kalshi-contributor/validation/ - Python parity fixtures:
context-sdk-python/examples/client/validation/
resolution.selectedCandidateId, shortlist/related/rejected ids, anddegradedReasonCodesearchMetadata.judge.usagefor observed tokens, cost, and latencysearchMetadata.provenanceandsearchMetadata.tracefor retrieval sources and validator behavior
Validation Anchors For Later Batches
Future implementation work must keep passing these anchors:- Polymarket:
What are Polymarket's implied odds on US or allied boots on the ground in Iran—which specific markets, resolution dates, and price levels matter, and how should I think about that for broader risk-off positioning? - Kalshi:
What does Kalshi imply about the Supreme Court tariffs case, and which exact market should I inspect? - Generic overlap fixture: multiple overlapping candidates exist but one best match should win
- Still-ambiguous fixture: low confidence or clarification survives instead of fake certainty
- Capability-miss fixture: no candidate can satisfy the requested venue or capability
Rollout Recommendation
Do not recommend this helper as the default marketplace pattern just because it now ships in both SDKs. Broader rollout stays gated until all of these hold at the same time:- TypeScript and Python surfaces still match on config knobs,
searchMetadata, validation-artifact semantics, and developer-trace visibility. - The TypeScript contributor pilots keep passing the Polymarket and Kalshi anchors with bounded judge cost and bounded judge latency visible in saved artifacts.
- Python parity fixtures keep passing the same named and generic behavior classes, including still-ambiguous and capability-miss cases.
- Checked-in machine-readable artifacts remain replayable in all three validation directories.
- Degraded outcomes stay honest in both traces and artifacts instead of being flattened into fake certainty.

