Building the Nylas MCP Server: Lessons from the Trenches

Building the Nylas MCP Server: Lessons from the Trenches

14 min read

The Model Context Protocol (MCP) is revolutionizing how AI agents interact with external systems. At Nylas, we recently built and released our own MCP server that enables AI agents to seamlessly interact with email and calendar data. You can check out the official documentation to see what we built.

But here’s the thing: building an MCP server isn’t just about wrapping your existing API endpoints. It requires a fundamental shift in how you think about API design, AI agent capabilities, and security. After months of development, testing, and iterating, we’ve learned a lot—and we’re here to share those hard-won lessons with you.

Architecture Decisions: One Process, Two Ports

When we first started building the Nylas MCP server, the obvious question was: should we create a new microservice or integrate it into our existing API service?

We chose the latter. Our API service runs on port 80, and the MCP server runs on port 8080—both handled by different goroutines in the same server process.

Here’s why this decision made sense:

Performance Benefits:

  • Zero network hops: Most MCP tool calls are essentially wrappers around Nylas API endpoints. By hosting the MCP server in the same process, we eliminate the network latency of calling from one service to another.
  • Direct handler invocation: The MCP server can directly call the same handler functions used by our REST API, bypassing HTTP overhead entirely.
  • Simplified error handling: No need to handle network errors like rate limiting, timeouts, or connection failures between services.

Operational Simplicity:

  • One less service to deploy, monitor, and maintain
  • Shared middleware, authentication, and logging infrastructure
  • Easier debugging with unified traces and logs

The trade-off? Slightly more complex code organization. But for our use case, the performance and operational benefits far outweighed the complexity.

The AI Agent Mindset Shift

This was perhaps the biggest revelation during development: AI agents are not human developers. When building traditional REST APIs, we follow resource-oriented design patterns (like Google’s AIP-121). We create predictable endpoints:

  • GET /events/{id} – retrieve one event
  • GET /events – list all events (with pagination)
  • POST /events – create an event
  • PUT /events/{id} – update an event
  • DELETE /events/{id} – delete an event

This works great for humans. A developer sees POST /events and immediately understands there must be corresponding GET, PUT, and DELETE endpoints. They can read API documentation, understand HTTP methods, and navigate pagination.

AI agents don’t work this way.

An AI agent doesn’t see HTTP methods or endpoints. It doesn’t browse API documentation. Instead, it reads tool descriptions and understands tools by their purpose and use case. Each tool description must be self-contained and explicit about:

  • What the tool does
  • When to use it
  • What the effect will be
  • All possible use cases

This means your tool descriptions need to be much more verbose than your API documentation. You can’t rely on REST conventions or assume the AI will infer relationships between tools.

Less is More: Curating Your Tool Set

One common mistake we see developers make: creating an MCP tool for every single API endpoint.

Don’t do this.

Here’s why: each tool has a long, detailed description. Too many tools overwhelm the AI agent, causing it to:

  • Consume excessive tokens just reading tool descriptions
  • Lose focus and make poor tool selection decisions
  • Struggle with tool discovery

Instead, we took a use-case-driven approach. We analyzed the most popular Nylas use cases:

  • Sending emails
  • Checking availability
  • Creating and managing drafts
  • Scheduling events
  • Listing messages and threads

We deliberately didn’t build tools for edge cases like:

  • Sending RSVP responses to meeting hosts
  • Marking emails as starred
  • Advanced folder management

These are valid use cases, but they’re rarely what someone wants an AI agent to do. By focusing on the 80/20 rule, we kept our tool count to 18 tools—manageable for both the AI agent and our maintenance burden.

Time is Hard: Building Time Management Tools

AI agents are surprisingly bad at time calculations. They:

  • Don’t know the current time
  • Make mistakes when translating between timezones
  • Struggle with epoch time conversions

We learned this the hard way during early testing. An agent would try to schedule a meeting for “tomorrow at 10 AM” and end up creating an event in the wrong timezone or on the wrong day.

Our solution: explicit time management tools.

We built three dedicated tools:

  1. current_time: Returns the current epoch time and date for a given timezone
  2. datetime_to_epoch: Converts human-readable date/time strings to Unix timestamps
  3. epoch_to_datetime: Converts Unix timestamps back to human-readable strings

We also implemented a double-check mechanism. When an agent needs to work with time (checking availability, scheduling meetings, listing events), we force it to:

  1. Ask the user for their timezone
  2. Use current_time to get the current time
  3. Use datetime_to_epoch to convert user input to epoch time
  4. Use epoch_to_datetime to convert back and verify the calculation
  5. Compare the result with the user’s original request before proceeding

This might seem verbose, but it’s the only way to ensure accuracy. Time-related bugs are some of the most frustrating for users, so the extra tool calls are worth it.

Security First: Preventing Prompt Injection Attacks

Email sending is a high-risk operation. We’ve all heard horror stories about AI agents sending emails to the wrong people or with unintended content. But there’s a more subtle threat: prompt injection attacks.

A malicious user could craft a prompt that tricks the AI agent into:

  • Sending emails without user confirmation
  • Including sensitive information in email content
  • Bypassing security checks

Our defense: mandatory human confirmation.

Before any email is sent, the MCP server forces the AI agent to:

  1. Call confirm_send_message (or confirm_send_draft) to generate a confirmation preview
  2. Display the preview to the user showing:
    • Recipients (to, cc, bcc)
    • Subject line
    • Body preview (first 200 characters)
  3. Wait for the user to type exactly: “Yes, send this message”
  4. Only proceed if the user’s response matches exactly (case-sensitive)

Any other input aborts the email sending. This ensures that every email sending operation has explicit human oversight.

Enforcing Tool Call Order: The Secret Hash Pattern

But wait—what if a malicious actor tries to bypass the confirmation step? They could try to inject system prompts or confirmation text directly into the email body, hoping the AI agent will interpret it as user confirmation.

Enter the secret hash pattern.

We implemented a cryptographic confirmation system:

  1. When confirm_send_message is called, it generates a deterministic hash from:
    • Message recipients (to, cc, bcc)
    • Subject line
    • Body content
    • A secret salt (stored server-side)
  2. This hash is returned to the AI agent as confirmation_hash
  3. The send_message tool requires this confirmation_hash parameter
  4. The server validates the hash by recalculating it from the message content

This means:

  • The AI agent must call the confirmation tool first to get the hash
  • The hash is cryptographically tied to the specific message content
  • System prompt injection can’t generate a valid hash without the server’s secret salt
  • Even if someone tries to inject “Yes, send this message” in the email body, they still need the correct hash

This pattern ensures tool call order is enforced at the protocol level, not just in documentation.

The Three Levels of Description

MCP servers have three distinct levels of description, each serving a different purpose:

Level 1: Server Instructions

The top-level description that explains:

  • What the MCP server does
  • When and how to use it
  • High-level tool overview
  • Important constraints or requirements

Rule: Server instructions should not contain tool descriptions or parameter details.

Level 2: Tool Descriptions

Each tool has its own description explaining:

  • What the tool does
  • When to use it
  • Use cases and examples
  • Important workflow requirements

Rule: Tool descriptions should not contain parameter annotations or implementation details.

Level 3: Parameter Annotations

Each parameter has a JSON schema annotation explaining:

  • What the parameter does
  • Format requirements
  • Optional vs required
  • How to disable filters (for optional parameters)

Rule: Keep annotations focused on the parameter itself, not the tool’s overall purpose.

This separation of concerns makes the MCP server easier for AI agents to understand. They can scan server instructions to understand the big picture, read tool descriptions to choose the right tool, and check parameter annotations for implementation details.

In Go, we use jsonschema tags to make parameters self-explanatory:

type datetimeToEpochInput struct {
    Date     string `json:"date" jsonschema:"date in YYYY-MM-DD format"`
    Time     string `json:"time" jsonschema:"time in HH:MM:SS format"`
    Timezone string `json:"timezone" jsonschema:"timezone to use for parsing (e.g., 'America/New_York')"`
}

Examples Are Your Best Friend

When writing tool descriptions, examples are crucial.

AI agents learn from examples. A well-written example can clarify:

  • Input format and syntax
  • Expected output structure
  • Common use cases
  • Edge cases and limitations

For example, our list_messages tool description includes examples for:

  • Simple listing (no filters)
  • Filtering by properties (from, to, subject, etc.)
  • Full-text search with provider-specific syntax (Gmail, Microsoft Graph, IMAP)
  • Listing messages in a specific thread

Each example shows the exact query syntax, including how to construct Gmail search queries, Microsoft Graph filters, and IMAP search criteria.

Pro tip: Include examples for all major use cases, not just the happy path. Show how to handle errors, edge cases, and provider-specific quirks.

Explicit Optional Parameters

AI agents have a habit of passing optional parameters even when they’re not needed. For example, if you’re not searching messages by subject, the agent might still try to pass subject: "" or subject: null.

The solution: explicitly tell the AI how to disable filters.

In our parameter annotations, we include instructions like:

  • “Use empty string to disable this filter”
  • “Set to 0 to disable this filter”
  • “Leave empty for simple listing”

For example, in list_messages, we document:

Subject string `json:"subject,omitempty" jsonschema:"Filter messages by subject. Returns messages whose subject contains this string (case-insensitive substring match). Use empty string to disable this filter."`

This might seem verbose, but it prevents the AI agent from making incorrect assumptions about how to represent “no filter.”

Don’t Assume AI Knows Formatting

AI agents don’t always understand:

  • When to use quotation marks
  • URL encoding requirements
  • Special character escaping
  • Provider-specific syntax requirements

Be explicit about formatting requirements.

For example, in our search query documentation, we explicitly state:

  • Gmail: Use quotes for exact phrases: "exact phrase"
  • Microsoft Graph: Wrap queries in quotes: "pizza"
  • IMAP: Use uppercase keywords: SUBJECT pizza
  • EWS: Use colon syntax: subject:pizza

We also document URL encoding requirements:

  • “Use URL encoding for query parameters”
  • “Special characters must be URL-encoded”

Don’t assume the AI agent will figure this out from context. Spell it out.

Protocol Support: SSE vs StreamableHTTP

The MCP specification supports two protocols:

  • SSE (Server-Sent Events): The original protocol, now deprecated but still widely used
  • StreamableHTTP: The newer protocol with better performance and features

Reality check: Many AI agents still use SSE.

A good MCP server should support both protocols. Here’s how we handle it:

  1. Content negotiation: The server checks the Accept header
    • Accept: text/event-stream → SSE response
    • Accept: application/json → StreamableHTTP (JSON) response
  2. Method-based routing:
    • GET requests typically use SSE
    • POST requests typically use JSON
  3. Fallback support: If both are accepted, prefer JSON but support SSE

In our Go implementation, we use the MCP SDK’s StreamableHTTPHandler with JSONResponse: true, which automatically handles content negotiation.

Session Management in Distributed Systems

MCP requests are stateful. They’re managed by sessions:

  1. AI agent connects to MCP server and initializes
  2. Server returns a session ID
  3. All subsequent requests include the session ID in headers

The problem: Session IDs are usually stored in local server cache (in-memory). In a distributed system with multiple pods:

  • Request 1 might hit Pod A and create session X
  • Request 2 might hit Pod B, which doesn’t know about session X
  • Request 2 fails with “404 session not found”

The solution: sticky sessions.

Configure your load balancer to route requests to the same pod based on the session ID header. This ensures all requests for a given session hit the same pod, which has the session in its local cache.

Alternative solution: Use stateless mode (which we do). The MCP SDK supports a stateless mode where each request creates a temporary session. This works well for one-time tool calls but prevents server→client requests.

Why We Don’t Use MCP Elicitation

A common question is why the Nylas MCP server does not support “elicitation”—the process where the server asks the client for missing parameters or confirmation during a tool call.

The answer is simple: in practice, elicitation just doesn’t work reliably with most MCP clients.

Here’s why:

The lifecycle of an MCP tool call is: connect → initialize → send request → receive response → finish session → disconnect. In theory, the server can send a request back to the client (an elicitation) while the tool call is still in progress. However, in reality, most clients either don’t implement this flow at all, or do so unreliably. We found that:

  • Many clients block the UI or request loop while waiting for the tool call result, and do not process any incoming server requests (elicitation) during this time.
  • As a result, the server waits indefinitely for the client’s elicitation response, while the client waits indefinitely for the server’s tool result.
  • This creates a deadlock: the AI agent is waiting for the server, the server is waiting for the client, and the user cannot interact or confirm anything because the UI is stuck.

We tested this behavior with Postman, Cursor, and Claude Desktop. None of them handled elicitation correctly—all tool requests became stuck in this deadlock. The AI agent waited for a server response, the server waited for client confirmation, and the user was left unable to proceed.

Given these practical limitations, we decided not to use elicitation at all in our MCP server. Instead, we require all necessary parameters and confirmations to be provided up front, ensuring a smooth and predictable experience for both users and AI agents.

Testing Your MCP Server

Testing MCP servers is tricky because you need to simulate AI agent behavior. Here are the tools we use:

Postman

Postman supports MCP requests, which is great for basic tool calling. However, as of this writing, you can’t choose between SSE and StreamableHTTP protocols—it uses whatever the server prefers.

Use case: Quick smoke tests and manual tool verification.

Claude Desktop

Claude Desktop only supports stdio MCP servers, not remote HTTP servers. To test with Claude Desktop, you need to:

  1. Install npx tool
  2. Use a local proxy that converts HTTP MCP to stdio
  3. Follow the Nylas MCP integration guide

Use case: Testing with a real AI agent in a desktop environment.

Cursor

Cursor supports remote MCP servers with Bearer token authentication. This is our primary testing environment. Integration instructions are in our documentation.

Important: Cursor has an upper limit of 40 tools. Nylas MCP has 18 tools, so you need to turn off other MCP integrations if your total tool count exceeds 40.

Use case: Primary development and testing environment.

ChatGPT

ChatGPT only supports remote MCP servers with basic auth. Since Nylas uses Bearer token authentication with API keys, ChatGPT cannot integrate with Nylas MCP server.

Use case: Not applicable for our use case.

Gemini

As of this writing, Gemini doesn’t work well with any MCP server. We don’t recommend using it for testing.

Multilingual Testing

Here’s a pro tip that saved us from many bugs: test in multiple languages.

AI agents process instructions differently depending on the language. A tool description that works perfectly in English might be ambiguous in another language. We test with both:

  • English (primary language)
  • A secondary language (we use Chinese, but any language works)

If both work, we have confidence that our tool descriptions are clear and unambiguous. If one language fails, it usually indicates that our descriptions need to be more explicit or that we’re relying on language-specific idioms.

Conclusion

Building an MCP server is more than just wrapping your API. It requires:

  • Understanding how AI agents think (differently from humans)
  • Prioritizing use cases over API completeness
  • Building explicit tooling for AI weaknesses (like time calculations)
  • Implementing security measures that work with AI agents
  • Writing descriptions that are verbose but clear
  • Supporting multiple protocols and clients

The Nylas MCP server is now live and powering AI agent integrations across the ecosystem. We’ve learned a lot along the way, and we hope these lessons help you build better MCP servers.

If you’re building your own MCP server, we’d love to hear about your experiences. And if you’re using the Nylas MCP server, check out our documentation to get started.

Happy building! 🚀

Related resources

How Nylas improved API reliability from 99.9% to 99.99%

Reliability is the foundation of trust for every API platform. At Nylas, we take that…

Reduce Deployment Time by 30% With Continuous Delivery and GitHub Actions

Learn how Nylas improved our continuous delivery pipeline by migrating from Jenkins to GitHub Actions.

The Deceptively Complex World of RRULEs in Calendar Events

How to work with repeating calendar events, from RFC-5545 and beyond.