Skip to main content
Benchmark date: April 2026 · Sample: 10 live crypto queries · Context side: production ContextClient.query.run() on the live marketplace · Baseline: raw Gemini web chat with no tools and no browsing.

The one-sentence version

Free LLMs don’t refuse questions they can’t ground. They answer them anyway — with confident, plausible, specific numbers that look live and aren’t. Context returns grounded answers with a verifiable source trail, or it says it can’t.

Why this matters more than “LLMs are stale”

The usual pitch

“Generic LLMs can’t access live data.”True, but weak. Every AI company says this. Buyers have heard it a thousand times and it doesn’t prove you’re different — just that your competitor shipped a tool-use loop.

The real pitch

“Generic LLMs will confidently make up live market data rather than refuse. Context won’t — it either returns grounded numbers or it tells you the marketplace can’t answer.”That’s a trust argument, not a capability argument. And it’s the argument agents actually care about when they’re settling payments against a response.

Headline result

High differentiation

6 / 10 queriesContext returned a grounded answer with exact numbers, source venues, or verifiable URLs. Raw Gemini produced synthetic-looking “current” market narration with no grounding.

Moderate

1 / 10 queriesBoth answered, but Context was safer — it surfaced insufficient_data rather than overclaim, where Gemini produced a confident-but-ungrounded liquidation narrative.

Low / gap

3 / 10 queriesVenue-specific Hyperliquid prompts where Context’s current marketplace coverage returned capability_miss. These are carry-forward targets, not shipped wins.
The 6 high-differentiation wins are where you should see the wedge clearly. The 3 capability-misses are where the marketplace still needs to expand. We ship both numbers because the gaps are as informative as the wins.

Side-by-side: every query

1. BTC flows + news context

Prompt
“Is BTC flowing into or out of exchanges on Coinglass? Search for news explaining the flow.”
Grounded answer combining an on-chain flow tool with a news search tool:
  • +1,599.54 BTC 1-day net inflow
  • +4,703.98 BTC 7-day net inflow
  • -19,768.55 BTC 30-day net outflow
  • News context explaining the shift in the 7-day trend
Differentiation: High — This is the cleanest “impossible without us” example. Context is a single answer with live flow data and supporting news. The baseline is vibes.
Prompt
“Analyze exchange balance pressure for BTC and tell me whether flows look bullish or bearish right now.”
bearish_distribution_pressure verdict, score 35/100, with concrete 1d / 7d / 30d flow metrics and exchange concentration data.
Differentiation: High
Prompt
“Give me a crypto derivatives market overview right now, including sentiment and key risk signals.”
Multi-tool answer with a concrete RISK_OFF regime call, Fear & Greed 11, active liquidation hazard, crowding metrics, and trigger-level recommendations.
Differentiation: High
Prompt
“Scan BTC, ETH, SOL, and XRP for open-interest divergence and tell me which setup looks the most stretched.”
Grounded OI divergence scan with explicit per-asset deltas and a clear conclusion: SOL was the most actionable stretched setup at the time of the run.
Differentiation: High — Context does real scan-and-rank retrieval. The baseline mimics the format.
Prompt
“Give me a BTC valuation score using AHR999, rainbow chart, stock-to-flow, and bubble indicators.”
Grounded answer with a 8/10 valuation score, explicit AHR999, Fear & Greed, Puell Multiple, and Bubble Index values — plus an explicit note that Rainbow Chart and Stock-to-Flow values were not found in the retrieved dataset.
Differentiation: High — Context was more conservative than Gemini by flagging missing data. That conservatism is the trust signal.
Prompt
“Find live crypto markets on Polymarket with total volume above 10Mandliquidityabove10M and liquidity above 100k.”
One live qualifying market identified:
  • Market: What price will Bitcoin hit in 2026?
  • Total volume: exact dollar figure
  • Liquidity: exact dollar figure
  • 24h volume: exact dollar figure
  • Direct URL returned
Differentiation: High — The most commercial prompt in the set: answer a real market-discovery question with a tradable venue and URL, or don’t.
Prompt
“For HYPE on Hyperliquid, show the current orderbook depth and estimate slippage for a 50,000 dollar sell order.”
Returned capability_miss — the current marketplace coverage could not resolve venue-specific Hyperliquid orderbook data.
Differentiation: Low — This is an active gap. Context’s honest “I can’t answer that” is arguably safer than Gemini’s synthetic answer, but a capability-miss isn’t a win.
Prompt
“Compare current ETH funding on Hyperliquid versus Binance and say whether there is a funding arbitrage spread right now.”
Venue-scoped capability_miss for Hyperliquid.
Differentiation: Low
Prompt
“Is HYPE at risk of a liquidation cascade right now based on open interest and funding?”
Partially grounded. Did not claim a confirmed cascade, surfaced insufficient_data for the venue-specific part, but returned concrete OI and funding signals from the data that was available.
Differentiation: Moderate — Context was safer by refusing to overclaim. Gemini sounded more confident but had nothing behind it.
Prompt
“Show recent HYPE trades on Hyperliquid and tell me whether buyers or sellers are in control over the last few prints.”
capability_miss.
Differentiation: Low

What this proves

The biggest win mode is trust, not eloquence

Free LLMs answer live-data questions confidently and synthetically rather than refusing. Context refuses when it can’t ground an answer — and that refusal is the feature. Agents that settle USDC against a response need grounded data, not prose.

The wedge is real on the wins

The 6 high-differentiation cases aren’t marginal. They’re live-venue market discovery, exact liquidity / flow / funding requests, and multi-source synthesis where fresh numbers matter. These are queries that agents genuinely can’t answer from their training data.

Where the wedge is still weak

We are shipping honest numbers here. Three Hyperliquid venue-specific prompts (7, 8, 10) returned capability_miss because the marketplace doesn’t yet have the contributor coverage for those exact live surfaces. Query 9 was safer than Gemini but also didn’t fully close the venue-specific liquidation-cascade ask.These are carry-forward quality gaps, not reasons the wedge doesn’t exist. Use the 6 high-differentiation prompts as the PMF story; treat the 3 Hyperliquid gaps as the roadmap.

Try it yourself

The queries above are reproducible. Every Context-side answer above came from the same public SDK surface your agent can hit today:
import { ContextClient } from "@ctxprotocol/sdk";

const client = new ContextClient({ apiKey: process.env.CONTEXT_API_KEY });

const result = await client.query.run({
  query:
    "Is BTC flowing into or out of exchanges on Coinglass? Search for news explaining the flow.",
});

console.log(result.answer);
Your agent pays per response in USDC. No subscription, no KYC, no invented numbers.

Start with the SDK

Drop the Lite MCP into your agent in one config block.

How the runtime works

See the full architectural story behind the grounded answers above.