Benchmark date: April 2026 · Sample: 10 live crypto queries ·
Context side: production
ContextClient.query.run() on the live
marketplace · Baseline: raw Gemini web chat with no tools and no
browsing.The one-sentence version
Free LLMs don’t refuse questions they can’t ground. They answer them anyway — with confident, plausible, specific numbers that look live and aren’t. Context returns grounded answers with a verifiable source trail, or it says it can’t.Why this matters more than “LLMs are stale”
The usual pitch
“Generic LLMs can’t access live data.”True, but weak. Every AI company says this. Buyers have heard it
a thousand times and it doesn’t prove you’re different — just that
your competitor shipped a tool-use loop.
The real pitch
“Generic LLMs will confidently make up live market data rather than
refuse. Context won’t — it either returns grounded numbers or it
tells you the marketplace can’t answer.”That’s a trust argument, not a capability argument. And it’s the
argument agents actually care about when they’re settling payments
against a response.
Headline result
High differentiation
6 / 10 queriesContext returned a grounded answer with exact numbers, source
venues, or verifiable URLs. Raw Gemini produced synthetic-looking
“current” market narration with no grounding.
Moderate
1 / 10 queriesBoth answered, but Context was safer — it surfaced
insufficient_data rather than overclaim, where Gemini produced
a confident-but-ungrounded liquidation narrative.Low / gap
3 / 10 queriesVenue-specific Hyperliquid prompts where Context’s current
marketplace coverage returned
capability_miss. These are
carry-forward targets, not shipped wins.The 6 high-differentiation wins are where you should see the wedge
clearly. The 3 capability-misses are where the marketplace still
needs to expand. We ship both numbers because the gaps are as
informative as the wins.
Side-by-side: every query
1. BTC flows + news context
1. BTC flows + news context
PromptDifferentiation:
“Is BTC flowing into or out of exchanges on Coinglass? Search for news explaining the flow.”
- Context
- Raw Gemini
Grounded answer combining an on-chain flow tool with a news
search tool:
+1,599.54 BTC1-day net inflow+4,703.98 BTC7-day net inflow-19,768.55 BTC30-day net outflow- News context explaining the shift in the 7-day trend
High — This is the cleanest “impossible
without us” example. Context is a single answer with live flow
data and supporting news. The baseline is vibes.2. Exchange balance pressure (bullish vs. bearish)
2. Exchange balance pressure (bullish vs. bearish)
PromptDifferentiation:
“Analyze exchange balance pressure for BTC and tell me whether flows look bullish or bearish right now.”
- Context
- Raw Gemini
bearish_distribution_pressure verdict, score 35/100, with
concrete 1d / 7d / 30d flow metrics and exchange concentration
data.High3. Crypto derivatives market overview
3. Crypto derivatives market overview
PromptDifferentiation:
“Give me a crypto derivatives market overview right now, including sentiment and key risk signals.”
- Context
- Raw Gemini
Multi-tool answer with a concrete
RISK_OFF regime call,
Fear & Greed 11, active liquidation hazard, crowding metrics,
and trigger-level recommendations.High4. Cross-asset OI divergence scan
4. Cross-asset OI divergence scan
PromptDifferentiation:
“Scan BTC, ETH, SOL, and XRP for open-interest divergence and tell me which setup looks the most stretched.”
- Context
- Raw Gemini
Grounded OI divergence scan with explicit per-asset deltas and
a clear conclusion: SOL was the most actionable stretched
setup at the time of the run.
High — Context does real scan-and-rank
retrieval. The baseline mimics the format.5. Multi-indicator BTC valuation score
5. Multi-indicator BTC valuation score
PromptDifferentiation:
“Give me a BTC valuation score using AHR999, rainbow chart, stock-to-flow, and bubble indicators.”
- Context
- Raw Gemini
Grounded answer with a
8/10 valuation score, explicit AHR999,
Fear & Greed, Puell Multiple, and Bubble Index values —
plus an explicit note that Rainbow Chart and Stock-to-Flow
values were not found in the retrieved dataset.High — Context was more conservative than
Gemini by flagging missing data. That conservatism is the trust
signal.6. Live Polymarket liquidity filter
6. Live Polymarket liquidity filter
PromptDifferentiation:
“Find live crypto markets on Polymarket with total volume above 100k.”
- Context
- Raw Gemini
One live qualifying market identified:
- Market: What price will Bitcoin hit in 2026?
- Total volume: exact dollar figure
- Liquidity: exact dollar figure
- 24h volume: exact dollar figure
- Direct URL returned
High — The most commercial prompt in the
set: answer a real market-discovery question with a tradable venue
and URL, or don’t.7. HYPE orderbook slippage estimate
7. HYPE orderbook slippage estimate
PromptDifferentiation:
“For HYPE on Hyperliquid, show the current orderbook depth and estimate slippage for a 50,000 dollar sell order.”
- Context
- Raw Gemini
Returned
capability_miss — the current marketplace coverage
could not resolve venue-specific Hyperliquid orderbook data.Low — This is an active gap. Context’s
honest “I can’t answer that” is arguably safer than Gemini’s
synthetic answer, but a capability-miss isn’t a win.8. ETH funding arbitrage — Hyperliquid vs. Binance
8. ETH funding arbitrage — Hyperliquid vs. Binance
PromptDifferentiation:
“Compare current ETH funding on Hyperliquid versus Binance and say whether there is a funding arbitrage spread right now.”
- Context
- Raw Gemini
Venue-scoped
capability_miss for Hyperliquid.Low9. HYPE liquidation cascade risk
9. HYPE liquidation cascade risk
PromptDifferentiation:
“Is HYPE at risk of a liquidation cascade right now based on open interest and funding?”
- Context
- Raw Gemini
Partially grounded. Did not claim a confirmed cascade, surfaced
insufficient_data for the venue-specific part, but returned
concrete OI and funding signals from the data that was
available.Moderate — Context was safer by refusing to
overclaim. Gemini sounded more confident but had nothing behind it.10. HYPE trade-tape flow analysis
10. HYPE trade-tape flow analysis
PromptDifferentiation:
“Show recent HYPE trades on Hyperliquid and tell me whether buyers or sellers are in control over the last few prints.”
- Context
- Raw Gemini
capability_miss.LowWhat this proves
The biggest win mode is trust, not eloquence
Free LLMs answer live-data questions confidently and
synthetically rather than refusing. Context refuses when it can’t
ground an answer — and that refusal is the feature. Agents that
settle USDC against a response need grounded data, not prose.
The wedge is real on the wins
The 6 high-differentiation cases aren’t marginal. They’re
live-venue market discovery, exact liquidity / flow / funding
requests, and multi-source synthesis where fresh numbers matter.
These are queries that agents genuinely can’t answer from their
training data.
Where the wedge is still weak
Try it yourself
The queries above are reproducible. Every Context-side answer above came from the same public SDK surface your agent can hit today:Start with the SDK
Drop the Lite MCP into your agent in one config block.
How the runtime works
See the full architectural story behind the grounded answers above.

