MCP

Designing Prompts That Ship With Your MCP Server

Ravinder·February 25, 2026·10 min read

MCPPromptsAPI DesignAI

Designing Prompts That Ship With Your MCP Server

Nobody talks about the MCP prompt API. Every tutorial covers tools, most cover resources, and the prompt API gets a paragraph at the end as if it were a footnote. This is a mistake. The prompt API is the feature that transforms your MCP server from a collection of callable functions into something with genuine discoverability — a library of reusable, parameterized workflows that both AI clients and human users can browse, understand, and invoke.

The analogy I keep reaching for is the difference between a library of functions and a library of functions with docstrings, type signatures, and usage examples. Both work. One is dramatically more useful to anyone who did not write it.

This post is about building prompts that are actually worth shipping: parameterized, versioned, documented, and designed so that clients can present them meaningfully to end users.

What the Prompt API Is and Is Not

The MCP prompt API allows your server to advertise named prompt templates via prompts/list. A client can call prompts/get with a prompt name and arguments to receive a fully-rendered list of messages ready to send to an LLM. The server does the rendering; the client does the LLM call.

This is different from tools in a critical way: tools are for taking actions and returning data. Prompts are for generating a structured conversation — a sequence of system, user, and assistant messages — that frames a particular task. The LLM's response to those messages is the output, not anything your server computes directly.

sequenceDiagram participant U as User / Client participant MCP as MCP Server participant LLM as LLM API U->>MCP: prompts/list MCP-->>U: [{ name, description, arguments }] U->>MCP: prompts/get { name: "review_pr", arguments: { prUrl, focus } } MCP-->>U: { messages: [system, user, ...] } U->>LLM: messages (from MCP) LLM-->>U: assistant response

The server's job is to take the arguments (a PR URL, a focus area, a code snippet) and produce a well-crafted message sequence. The client's job is to display the prompt list to the user and forward the rendered messages to the LLM. The client does not need to know anything about prompt engineering.

Anatomy of a Well-Designed Prompt

The prompts/list response shape determines discoverability. Every prompt needs three things done well:

A name that is a stable identifier — snake_case, no version numbers in the name itself.
A description that a human (or LLM parsing the capability list) can understand without additional context.
An arguments array that documents each parameter with a description and whether it is required.

// mcp-server/src/prompts/index.ts
import { ListPromptsResultSchema } from "@modelcontextprotocol/sdk/types.js";
 
export const promptRegistry = [
  {
    name: "review_pull_request",
    description:
      "Generate a structured code review for a pull request. Focuses on correctness, performance, and security. Returns a review organized by severity.",
    arguments: [
      {
        name: "diff",
        description: "The unified diff of the PR. Paste the output of `git diff main...HEAD`.",
        required: true,
      },
      {
        name: "focus",
        description:
          "Optional focus area: 'security', 'performance', 'style', or 'all'. Defaults to 'all'.",
        required: false,
      },
      {
        name: "context",
        description:
          "Optional: a sentence describing what the PR is trying to accomplish. Helps the reviewer understand intent.",
        required: false,
      },
    ],
  },
  {
    name: "explain_error",
    description:
      "Given a stack trace or error message, explain what went wrong and suggest three specific fixes ordered by likelihood.",
    arguments: [
      {
        name: "error",
        description: "The full error message or stack trace.",
        required: true,
      },
      {
        name: "language",
        description: "Programming language or runtime: 'typescript', 'python', 'java', 'go', etc.",
        required: false,
      },
    ],
  },
  {
    name: "generate_migration",
    description:
      "Generate a database migration SQL file from a description of the schema change. Produces both up and down migrations.",
    arguments: [
      {
        name: "change_description",
        description:
          "Plain-English description of what needs to change. E.g. 'Add a nullable avatar_url column to the users table'.",
        required: true,
      },
      {
        name: "dialect",
        description: "SQL dialect: 'postgresql', 'mysql', or 'sqlite'. Defaults to 'postgresql'.",
        required: false,
      },
    ],
  },
];

The description on each argument is as important as the description on the prompt itself. A client that presents this list to a human user needs enough information to render a useful form. A client that is an AI agent needs enough information to decide whether to invoke the prompt and what to pass.

Rendering Prompts — The Server's Core Responsibility

When a client calls prompts/get, your server must return an array of messages. This is where the engineering work is. A trivial implementation just interpolates arguments into a template string. A good implementation validates arguments, handles defaults, assembles context from resources, and produces messages that are genuinely well-crafted.

// mcp-server/src/prompts/review-pull-request.ts
import { GetPromptResult } from "@modelcontextprotocol/sdk/types.js";
 
interface ReviewArgs {
  diff: string;
  focus?: string;
  context?: string;
}
 
const FOCUS_INSTRUCTIONS: Record<string, string> = {
  security: "Pay particular attention to: injection vulnerabilities, authentication bypasses, insecure direct object references, hardcoded secrets, and unsafe deserialization.",
  performance: "Pay particular attention to: N+1 queries, missing indexes, unnecessary allocations, synchronous I/O in hot paths, and missing caching.",
  style: "Pay particular attention to: naming conventions, function length, cyclomatic complexity, dead code, and consistency with the existing patterns in the diff.",
  all: "Evaluate correctness, security, performance, and code style. Prioritize issues by severity.",
};
 
export function renderReviewPrPrompt(args: ReviewArgs): GetPromptResult {
  const focus = args.focus ?? "all";
  const focusInstruction = FOCUS_INSTRUCTIONS[focus] ?? FOCUS_INSTRUCTIONS["all"];
 
  const systemMessage = `You are a senior software engineer performing a code review. Be direct and specific. For each issue, cite the exact line or function. ${focusInstruction}
 
Format your response as:
## Critical Issues
(Must fix before merge. Bugs, security vulnerabilities, data loss risks.)
 
## Major Issues
(Should fix before merge. Performance problems, significant style violations, missing tests.)
 
## Minor Issues
(Nice to fix. Style nits, minor improvements.)
 
## Positive Observations
(What the PR does well — at least two items.)`;
 
  const userParts = [
    args.context ? `**Context:** ${args.context}\n` : "",
    `**Diff:**\n\`\`\`diff\n${args.diff}\n\`\`\``,
  ]
    .filter(Boolean)
    .join("\n");
 
  return {
    description: `Code review with focus: ${focus}`,
    messages: [
      { role: "user", content: { type: "text", text: systemMessage } },
      { role: "user", content: { type: "text", text: userParts } },
    ],
  };
}

Notice the separation: the system prompt explains the role and format, the user message contains the actual content. This produces better LLM outputs than jamming everything into one message, and it is more transparent to clients that inspect the message structure.

Parameter Design That Scales

Bad parameter design is the most common failure mode in MCP prompts. The anti-patterns:

Too few parameters, too much content in one field. If your only argument is "query", users will put complex structured data in a text field and you will get inconsistent results.

Too many parameters with no defaults. If 6 of your 8 arguments are required, the prompt is not usable from a quick interaction. Make the common case easy.

Enum values without a default. If an argument is an enum, it almost always has a sensible default. State it explicitly.

No validation on required arguments. Return a meaningful error when required arguments are missing or malformed — do not silently produce a bad prompt.

// mcp-server/src/prompts/handlers.ts
import { z } from "zod";
import { McpError, ErrorCode } from "@modelcontextprotocol/sdk/types.js";
 
const ReviewArgsSchema = z.object({
  diff: z.string().min(10, "diff must contain at least 10 characters"),
  focus: z.enum(["security", "performance", "style", "all"]).default("all"),
  context: z.string().max(500).optional(),
});
 
export function handleGetPrompt(name: string, args: unknown) {
  if (name === "review_pull_request") {
    const parsed = ReviewArgsSchema.safeParse(args);
    if (!parsed.success) {
      throw new McpError(
        ErrorCode.InvalidParams,
        `Invalid arguments for review_pull_request: ${parsed.error.message}`
      );
    }
    return renderReviewPrPrompt(parsed.data);
  }
 
  throw new McpError(ErrorCode.MethodNotFound, `Unknown prompt: ${name}`);
}

Versioning Prompts Without Breaking Clients

Prompts are consumed by clients that may cache them or present them to users as known workflows. Changing a prompt's behavior is a breaking change if clients are relying on a specific output format. The solution is to version at the description level, not the name level, and to maintain backwards compatibility for at least one version cycle.

My convention: the name is stable and never includes a version. The description includes [v2] when behavior changes significantly. Old names are kept as deprecated entries in prompts/list with a description that says "deprecated — use X instead" for one release cycle, then removed.

// Deprecated entry — kept for one release
{
  name: "review_pr",  // old name
  description: "[DEPRECATED: use review_pull_request] Generate a PR review.",
  arguments: [
    { name: "diff", description: "The PR diff.", required: true }
  ]
}

If you need to make the rendering logic significantly more capable, add arguments with defaults so existing callers continue to work. Only remove arguments when the old behavior is genuinely unsupportable.

Discovery UX — What Clients See

The prompt list is the primary discovery surface. Design it for the two audiences that will use it: human developers browsing in a client like Claude Desktop, and AI agents deciding at runtime which prompt is appropriate.

For human discovery, the description should answer: what does this do, what do I need to provide, what will I get back? One or two sentences. No jargon beyond what is necessary.

For agent discovery, the description should be distinguishable from other prompts in the same list. If you have three prompts that all "generate" something, they should be distinguishable from their description alone. An agent picking between generate_migration, generate_test_suite, and generate_api_docs needs to make the right choice from the description without calling prompts/get on all three.

graph TD CL[Client calls prompts/list] --> DL[Display list to user] CL --> AI[AI agent reads descriptions] DL --> UF[User selects prompt + fills form] AI --> AP[Agent picks best prompt for task] UF --> PG[prompts/get with arguments] AP --> PG PG --> RM[Server renders messages] RM --> LLM[Client sends to LLM]

Composing Prompts With Resources

The most powerful prompts pull in resource content at render time. A review_pull_request prompt that fetches the PR from GitHub, the relevant file history, and the team's coding standards document — and assembles all of that into a message sequence — is dramatically more useful than one that asks the user to paste a diff.

// mcp-server/src/prompts/review-with-context.ts
import { resourceRegistry } from "../resources";
 
export async function renderReviewWithContext(args: {
  prUrl: string;
  focus?: string;
}): Promise<GetPromptResult> {
  // Fetch the diff from GitHub
  const diff = await resourceRegistry.fetch(`github://pr/${encodeURIComponent(args.prUrl)}/diff`);
 
  // Fetch team coding standards if available
  const standards = await resourceRegistry.fetchOptional("file://docs/coding-standards.md");
 
  const systemParts = [
    "You are a senior software engineer performing a code review. Be direct and specific.",
    standards ? `**Team Coding Standards:**\n${standards}` : "",
  ]
    .filter(Boolean)
    .join("\n\n");
 
  return {
    description: `Code review for ${args.prUrl}`,
    messages: [
      { role: "user", content: { type: "text", text: systemParts } },
      {
        role: "user",
        content: {
          type: "text",
          text: `Review this pull request with focus on: ${args.focus ?? "all"}\n\n\`\`\`diff\n${diff}\n\`\`\``,
        },
      },
    ],
  };
}

This is the point where prompts become genuinely compelling: the client asks for a prompt, the server does complex work to assemble context from multiple sources, and the client gets a fully-prepared message sequence it can immediately send to the LLM. The client does not need to understand GitHub APIs, file fetching, or prompt engineering. It just calls prompts/get and forwards the result.

Key Takeaways

The MCP prompt API is a first-class discoverability mechanism — treat it as a library of parameterized workflows, not an afterthought.
Argument descriptions matter as much as prompt descriptions; clients render them as form labels and agents use them to decide what to pass.
Server-side validation with Zod (or equivalent) should reject malformed arguments with a meaningful ErrorCode.InvalidParams error before rendering.
Version prompts by updating descriptions and keeping deprecated names for one release cycle — never change a prompt's name, only deprecate it.
The most powerful prompts fetch resources at render time (diffs, docs, schemas) so the client receives a fully-assembled message sequence without needing to understand your data sources.
Design descriptions to serve two audiences simultaneously: humans browsing a list and AI agents distinguishing between similar-sounding options at runtime.