What the MCP? (Part 3): When Code LLMs Need Help

Everyone's adding MCP servers. Few are thinking about how they'll actually be called correctly.

  ยท   6 min read

Introduction

Everyone’s racing toward fully autonomous agents. The vision is compelling: AI that tolerates failure, recovers gracefully, and keeps marching toward its goals. And with 2,000+ MCP servers now in the registry, the tooling ecosystem is exploding.

But here’s what nobody’s talking about: what happens when the LLM doesn’t have all the info it needs to call a tool?

The MCP folks saw this coming. They built something called Elicitation. Most clients don’t support it yet. I built Quick Call with this from day one -> only realized I was ahead of the curve when I found out Claude Code doesn’t support it.

Let me show you what I mean.


The Scenario: “Send hi to Slack”

Same request. Two very different execution paths.

Quick Call (with elicitation)

Quick Call: one tool call, user picks channel inline

Quick Call: one tool call, user picks channel inline

What happens:

  1. User: “Send hi to Slack”
  2. Quick Call MCP server recognizes channel is missing
  3. Server pauses, shows dropdown: “Which channel?”
  4. User picks #general
  5. Message sent

Result: One tool call. One user interaction. Done.


Claude Code (without elicitation)

Claude Code: two tool calls, extra round-trip

Claude Code: two tool calls, extra round-trip

What happens:

  1. User: “Send hi to Slack”
  2. Claude thinks: “I need to know which channel”
  3. Claude calls list_channels -> gets channel list back
  4. Claude presents options: “Which channel?”
  5. User types: #general
  6. Claude calls send_message(channel="#general", message="hi")
  7. Done

Result: Two tool calls. Extra tokens. Extra latency.

To be clear: Claude Code is being smart here. It figured out it needed more info and found a workaround. But it’s still a workaround.

The difference? Elicitation lets the tool ask for what it needs. Without it, the LLM has to figure out how to get that info itself.

Think about it: who knows better what parameters a tool needs -> the tool or the LLM guessing from a description? The tool, obviously. Elicitation puts the tool in control of gathering its own inputs. That’s the fundamental shift.


The Cost of Being Clever

Every Extra Tool Call = $$$

The math is simple:

  • Each tool call = input tokens (tool definitions) + output tokens (response)
  • Extra list_channels call: ~500-1000 tokens round-trip
  • At scale: 1,000 messages/day ร— 500 tokens = 500K extra tokens/day

What does that cost?

ModelInput (per 1M)Output (per 1M)DailyMonthly
Claude Opus 4.5$5$25~$5~$150
GPT-4o$2.50$10~$2~$68

That’s $70-150/month for one feature’s inefficiency. Multiply by every tool that needs user input.


Beyond Cost: Reliability

Anthropic’s own benchmarks tell the story. From their Opus 4.5 announcement:

Scaled tool use (MCP Atlas):

ModelScoreFailure Rate
Opus 4.562.3%~38%
Sonnet 4.543.8%~56%
Opus 4.140.9%~59%

Even the best model fails 38% of the time on complex tool use scenarios. And that’s Opus 4.5 -> Anthropic’s flagship. Fewer tool calls = fewer chances to fail.


Latency Adds Up

Each tool call involves:

  • Model inference time
  • API round-trip
  • Response parsing

Claude Code’s workaround means 2x the wait time. The user sits there while the LLM fetches the channel list, processes it, formats the question, waits for input, then makes another call.

With elicitation? The tool pauses, asks, continues. One smooth interaction.

So how do we fix this?


Where Elicitation Shines

Use Cases

ScenarioWithout ElicitationWith Elicitation
AmbiguityFail or guess wrongAsk: “Which subscription to cancel?”
ConfirmationProceed blindlyAsk: “Type workspace name to confirm delete”
Missing paramsExtra tool call or errorAsk: “Enter your API key”
Progressive inputFront-load everything upfrontCollect step-by-step as needed

See It In Action

I’ve open-sourced a demo app that showcases Quick Call’s elicitation framework: quickcall-mcp-elicitation

The prompt is deliberately vague: “Schedule a meeting” -> no title, no participants, no time. The tool collects what it needs progressively through elicitation. One tool call, multiple user inputs, zero extra LLM round-trips.

Meeting scheduler with progressive elicitation

Meeting scheduler with progressive elicitation

Here’s how the flow works:

sequenceDiagram participant User participant Frontend participant Backend participant MCP as MCP Server User->>Frontend: "Schedule a meeting" Frontend->>Backend: Forward request Backend->>MCP: Execute tool rect rgb(255, 243, 224) Note over MCP: Elicitation: Get title MCP->>Frontend: ctx.elicit("title?") Frontend->>User: Show text input User->>Frontend: "Weekly Standup" Frontend->>MCP: Resume with value end rect rgb(255, 243, 224) Note over MCP: Elicitation: Get participants, duration, time end MCP-->>Backend: Meeting created Backend-->>Frontend: Response Frontend->>User: "Meeting scheduled!"

The tool pauses at each ctx.elicit() call, collects input via SSE, and resumes.

Wait, aren’t those still round-trips?

Each ctx.elicit() is a round-trip between backend and frontend: SSE event out, user responds, POST back, tool resumes. But critically, it’s not an LLM round-trip. The LLM calls schedule_meeting once. That single tool execution handles all user interactions internally. The LLM doesn’t re-enter the loop until the tool returns.


How It Works

Server Side: ctx.elicit()

In your MCP tool, call ctx.elicit() when you need user input:

 1from fastmcp.server.dependencies import get_context
 2
 3@mcp.tool()
 4async def schedule_meeting(title: Optional[str] = None, duration: Optional[str] = None):
 5    ctx = get_context()
 6
 7    # Free text input
 8    if not title:
 9        result = await ctx.elicit(
10            message="What should the meeting be called?",
11            response_type=str,
12        )
13        if result.action == "cancel":
14            return {"error": "Cancelled by user"}
15        title = result.data
16
17    # Single select from options
18    if not duration:
19        result = await ctx.elicit(
20            message="How long should the meeting be?",
21            response_type=["30 minutes", "1 hour", "2 hours"],
22        )
23        duration = result.data
24
25    return {"title": title, "duration": duration}

response_type determines the UI:

  • str -> text input
  • ["option1", "option2"] -> single select buttons
  • int, bool -> appropriate input fields

Client Side: Handle the pause

When ctx.elicit() is called, your client receives an SSE event:

1{
2  "type": "elicitation_request",
3  "elicitation_id": "chat_abc123",
4  "message": "What should the meeting be called?",
5  "options": null
6}

Render the UI, collect input, POST back:

1POST /elicitation/respond
2{
3  "elicitation_id": "chat_abc123",
4  "response": {"action": "accept", "value": "Weekly Standup"}
5}

The tool resumes from where it paused. That’s it.


Current Client Support

ClientElicitationNotes
Claude CodeNoIssue #2799 - 106 upvotes, assigned but no timeline
Quick CallYesBuilt-in from day one
GitHub CopilotYesShipped Dec 2025 - VS Code, VS 2026, JetBrains
CursorYesShipped - supports string, number, boolean, enum schemas

When I built Quick Call, elicitation was already available in FastMCP. I used it because making users re-prompt when a parameter was missing felt wrong. I’m looking forward to seeing Claude Code support this.


Final Thoughts

Elicitation isn’t UX polish. It’s the difference between tools that ask for what they need and LLMs that scramble to figure it out themselves.

Fewer tool calls. Fewer tokens. Fewer failures. Better UX.

Cursor and Copilot already support it. Claude Code will get there. Until then, build your tools right -> assume elicitation exists, and let your tools do the asking.


The MCP elicitation demo is open-sourced: quickcall-mcp-elicitation

Try Quick Call: Now with Claude Code integration -> quickcall.dev/claude-code

Catch up: Part 1: What the MCP? | Part 2: I Built Quick Call


Resources