The Model Context Protocol (MCP) is revolutionizing how AI agents interact with external systems. At Nylas, we recently built and released our own MCP server that enables AI agents to seamlessly interact with email and calendar data. You can check out the official documentation to see what we built.
But here’s the thing: building an MCP server isn’t just about wrapping your existing API endpoints. It requires a fundamental shift in how you think about API design, AI agent capabilities, and security. After months of development, testing, and iterating, we’ve learned a lot—and we’re here to share those hard-won lessons with you.
When we first started building the Nylas MCP server, the obvious question was: should we create a new microservice or integrate it into our existing API service?
We chose the latter. Our API service runs on port 80, and the MCP server runs on port 8080—both handled by different goroutines in the same server process.

Here’s why this decision made sense:
Performance Benefits:
Operational Simplicity:
The trade-off? Slightly more complex code organization. But for our use case, the performance and operational benefits far outweighed the complexity.

This was perhaps the biggest revelation during development: AI agents are not human developers. When building traditional REST APIs, we follow resource-oriented design patterns (like Google’s AIP-121). We create predictable endpoints:
GET /events/{id} – retrieve one eventGET /events – list all events (with pagination)POST /events – create an eventPUT /events/{id} – update an eventDELETE /events/{id} – delete an eventThis works great for humans. A developer sees POST /events and immediately understands there must be corresponding GET, PUT, and DELETE endpoints. They can read API documentation, understand HTTP methods, and navigate pagination.
AI agents don’t work this way.
An AI agent doesn’t see HTTP methods or endpoints. It doesn’t browse API documentation. Instead, it reads tool descriptions and understands tools by their purpose and use case. Each tool description must be self-contained and explicit about:
This means your tool descriptions need to be much more verbose than your API documentation. You can’t rely on REST conventions or assume the AI will infer relationships between tools.

One common mistake we see developers make: creating an MCP tool for every single API endpoint.
Don’t do this.
Here’s why: each tool has a long, detailed description. Too many tools overwhelm the AI agent, causing it to:
Instead, we took a use-case-driven approach. We analyzed the most popular Nylas use cases:
We deliberately didn’t build tools for edge cases like:
These are valid use cases, but they’re rarely what someone wants an AI agent to do. By focusing on the 80/20 rule, we kept our tool count to 18 tools—manageable for both the AI agent and our maintenance burden.

AI agents are surprisingly bad at time calculations. They:
We learned this the hard way during early testing. An agent would try to schedule a meeting for “tomorrow at 10 AM” and end up creating an event in the wrong timezone or on the wrong day.
Our solution: explicit time management tools.
We built three dedicated tools:
current_time: Returns the current epoch time and date for a given timezonedatetime_to_epoch: Converts human-readable date/time strings to Unix timestampsepoch_to_datetime: Converts Unix timestamps back to human-readable stringsWe also implemented a double-check mechanism. When an agent needs to work with time (checking availability, scheduling meetings, listing events), we force it to:
current_time to get the current timedatetime_to_epoch to convert user input to epoch timeepoch_to_datetime to convert back and verify the calculationThis might seem verbose, but it’s the only way to ensure accuracy. Time-related bugs are some of the most frustrating for users, so the extra tool calls are worth it.

Email sending is a high-risk operation. We’ve all heard horror stories about AI agents sending emails to the wrong people or with unintended content. But there’s a more subtle threat: prompt injection attacks.
A malicious user could craft a prompt that tricks the AI agent into:
Our defense: mandatory human confirmation.
Before any email is sent, the MCP server forces the AI agent to:
confirm_send_message (or confirm_send_draft) to generate a confirmation previewAny other input aborts the email sending. This ensures that every email sending operation has explicit human oversight.

But wait—what if a malicious actor tries to bypass the confirmation step? They could try to inject system prompts or confirmation text directly into the email body, hoping the AI agent will interpret it as user confirmation.
Enter the secret hash pattern.
We implemented a cryptographic confirmation system:
confirm_send_message is called, it generates a deterministic hash from:
confirmation_hashsend_message tool requires this confirmation_hash parameterThis means:
This pattern ensures tool call order is enforced at the protocol level, not just in documentation.

MCP servers have three distinct levels of description, each serving a different purpose:
The top-level description that explains:
Rule: Server instructions should not contain tool descriptions or parameter details.
Each tool has its own description explaining:
Rule: Tool descriptions should not contain parameter annotations or implementation details.
Each parameter has a JSON schema annotation explaining:
Rule: Keep annotations focused on the parameter itself, not the tool’s overall purpose.
This separation of concerns makes the MCP server easier for AI agents to understand. They can scan server instructions to understand the big picture, read tool descriptions to choose the right tool, and check parameter annotations for implementation details.
In Go, we use jsonschema tags to make parameters self-explanatory:
type datetimeToEpochInput struct {
Date string `json:"date" jsonschema:"date in YYYY-MM-DD format"`
Time string `json:"time" jsonschema:"time in HH:MM:SS format"`
Timezone string `json:"timezone" jsonschema:"timezone to use for parsing (e.g., 'America/New_York')"`
}
When writing tool descriptions, examples are crucial.
AI agents learn from examples. A well-written example can clarify:
For example, our list_messages tool description includes examples for:
Each example shows the exact query syntax, including how to construct Gmail search queries, Microsoft Graph filters, and IMAP search criteria.
Pro tip: Include examples for all major use cases, not just the happy path. Show how to handle errors, edge cases, and provider-specific quirks.

AI agents have a habit of passing optional parameters even when they’re not needed. For example, if you’re not searching messages by subject, the agent might still try to pass subject: "" or subject: null.
The solution: explicitly tell the AI how to disable filters.
In our parameter annotations, we include instructions like:
For example, in list_messages, we document:
Subject string `json:"subject,omitempty" jsonschema:"Filter messages by subject. Returns messages whose subject contains this string (case-insensitive substring match). Use empty string to disable this filter."`
This might seem verbose, but it prevents the AI agent from making incorrect assumptions about how to represent “no filter.”

AI agents don’t always understand:
Be explicit about formatting requirements.
For example, in our search query documentation, we explicitly state:
"exact phrase""pizza"SUBJECT pizzasubject:pizzaWe also document URL encoding requirements:
Don’t assume the AI agent will figure this out from context. Spell it out.

The MCP specification supports two protocols:
Reality check: Many AI agents still use SSE.
A good MCP server should support both protocols. Here’s how we handle it:
Accept header
Accept: text/event-stream → SSE responseAccept: application/json → StreamableHTTP (JSON) responseIn our Go implementation, we use the MCP SDK’s StreamableHTTPHandler with JSONResponse: true, which automatically handles content negotiation.

MCP requests are stateful. They’re managed by sessions:
The problem: Session IDs are usually stored in local server cache (in-memory). In a distributed system with multiple pods:
The solution: sticky sessions.
Configure your load balancer to route requests to the same pod based on the session ID header. This ensures all requests for a given session hit the same pod, which has the session in its local cache.
Alternative solution: Use stateless mode (which we do). The MCP SDK supports a stateless mode where each request creates a temporary session. This works well for one-time tool calls but prevents server→client requests.

A common question is why the Nylas MCP server does not support “elicitation”—the process where the server asks the client for missing parameters or confirmation during a tool call.
The answer is simple: in practice, elicitation just doesn’t work reliably with most MCP clients.
Here’s why:
The lifecycle of an MCP tool call is: connect → initialize → send request → receive response → finish session → disconnect. In theory, the server can send a request back to the client (an elicitation) while the tool call is still in progress. However, in reality, most clients either don’t implement this flow at all, or do so unreliably. We found that:
We tested this behavior with Postman, Cursor, and Claude Desktop. None of them handled elicitation correctly—all tool requests became stuck in this deadlock. The AI agent waited for a server response, the server waited for client confirmation, and the user was left unable to proceed.
Given these practical limitations, we decided not to use elicitation at all in our MCP server. Instead, we require all necessary parameters and confirmations to be provided up front, ensuring a smooth and predictable experience for both users and AI agents.
Testing MCP servers is tricky because you need to simulate AI agent behavior. Here are the tools we use:
Postman supports MCP requests, which is great for basic tool calling. However, as of this writing, you can’t choose between SSE and StreamableHTTP protocols—it uses whatever the server prefers.
Use case: Quick smoke tests and manual tool verification.
Claude Desktop only supports stdio MCP servers, not remote HTTP servers. To test with Claude Desktop, you need to:
npx toolUse case: Testing with a real AI agent in a desktop environment.
Cursor supports remote MCP servers with Bearer token authentication. This is our primary testing environment. Integration instructions are in our documentation.
Important: Cursor has an upper limit of 40 tools. Nylas MCP has 18 tools, so you need to turn off other MCP integrations if your total tool count exceeds 40.
Use case: Primary development and testing environment.
ChatGPT only supports remote MCP servers with basic auth. Since Nylas uses Bearer token authentication with API keys, ChatGPT cannot integrate with Nylas MCP server.
Use case: Not applicable for our use case.
As of this writing, Gemini doesn’t work well with any MCP server. We don’t recommend using it for testing.

Here’s a pro tip that saved us from many bugs: test in multiple languages.
AI agents process instructions differently depending on the language. A tool description that works perfectly in English might be ambiguous in another language. We test with both:
If both work, we have confidence that our tool descriptions are clear and unambiguous. If one language fails, it usually indicates that our descriptions need to be more explicit or that we’re relying on language-specific idioms.

Building an MCP server is more than just wrapping your API. It requires:
The Nylas MCP server is now live and powering AI agent integrations across the ecosystem. We’ve learned a lot along the way, and we hope these lessons help you build better MCP servers.
If you’re building your own MCP server, we’d love to hear about your experiences. And if you’re using the Nylas MCP server, check out our documentation to get started.
Happy building! 🚀
Engineer Manager