If you are building an AI agent that holds email conversations over multiple turns, one exchange is the easy part. The hard part starts on the second reply. By the time it lands, the agent has restarted, redeployed, or simply gone hours without that conversation in active memory. Whatever it was tracking is gone. The reply arrives as a blank slate.
This breaks most agent demos in production. In-memory state disappears on the next deploy. The agent gets a second reply with no record of the first. A contact goes quiet for a week and writes back. Without a durable state model and explicit lifecycle handling, the conversation collapses at the first complication.
Here is how to build the full loop: the agent sends, waits for a reply, restores context, decides what to say next, replies, and waits again. It runs on webhooks and the Threads API with no polling.
Before you start, you’ll need an Agent Account on a registered domain and a webhook endpoint Nylas can reach. The quickstart sets both up in about 5 minutes.
The conversation state model
Every active conversation needs a durable record keyed to the Nylas thread_id. When the agent sends the first message, Nylas assigns a thread_id. Every subsequent reply, inbound or outbound, lands under that same ID. It is the anchor the agent uses to find its place in any conversation, even after a restart.
const conversationRecord = { threadId: "nylas-thread-id", grantId: AGENT_GRANT_ID, contactEmail: "[email protected]", contactName: "Alice Chen", purpose: "demo_followup", // What started this conversation step: "awaiting_reply", // Where in the workflow the agent is turnCount: 1, // How many exchanges have happened maxTurns: 10, // Safety cap before escalation createdAt: "2026-04-14T10:00:00Z", lastActivityAt: "2026-04-14T10:00:00Z", metadata: {}, // Workflow-specific data};
Store this in Postgres, Redis with AOF, or DynamoDB. Not in memory. A conversation can go silent for two days and resume. In-memory state will not be there when it does.
The step field drives everything. It tracks what the agent is waiting for and determines how it handles the next message. A scheduling agent might cycle through awaiting_time_selection, awaiting_rsvp, booked. A support agent might move through open, awaiting_customer, resolved. The field names are yours; the pattern is the same.
Start the conversation
When the agent initiates contact, it sends the first message and creates the conversation record in one sequence. The thread_id comes back in the send response.
async function startConversation({ to, subject, body, purpose, metadata }) { const sent = await nylas.messages.send({ identifier: AGENT_GRANT_ID, requestBody: { to: [{ email: to.email, name: to.name }], subject, body, }, }); await db.conversations.create({ threadId: sent.data.threadId, grantId: AGENT_GRANT_ID, contactEmail: to.email, contactName: to.name, purpose, step: "awaiting_reply", turnCount: 1, maxTurns: 10, createdAt: new Date().toISOString(), lastActivityAt: new Date().toISOString(), metadata: metadata ?? {}, }); return sent.data;}
sent.data.threadId is the key. Everything else in the system runs through it: the webhook handler, the context restoration, the state updates. One note on field casing: webhook payloads arrive as raw JSON in snake_case (grant_id, thread_id), while values the SDK returns are camelCase (threadId, messageIds). Seeing both in one handler is expected, not a typo.
Handle inbound replies
The webhook handler does three things: filter out the agent’s own outbound messages, look up the conversation by thread_id, and route to the multi-turn loop if one exists. If the thread is unknown, the inbound is a fresh message and triage handles it separately.
The self-message filter is not optional. When the agent sends a reply via the API, message.created fires for that outbound message too. Without the filter, the agent reads its own replies as inbound and the loop runs against itself.
Restore context and generate a reply
Before the agent generates anything, it fetches the complete thread and reconstructs the full conversation history. The LLM needs the entire exchange to produce a contextually appropriate next message, not just the latest reply.
async function continueConversation(msg, conversation) { // Fetch the full body. The webhook payload only carries a snippet. const fullMessage = await nylas.messages.find({ identifier: AGENT_GRANT_ID, messageId: msg.id, }); // Pull the full thread so the LLM sees every prior exchange. const thread = await nylas.threads.find({ identifier: AGENT_GRANT_ID, threadId: conversation.threadId, }); // Fetch every message in the thread for full conversation history. const allMessages = await Promise.all( thread.data.messageIds.map((id) => nylas.messages.find({ identifier: AGENT_GRANT_ID, messageId: id }) ) ); // Format as a transcript the LLM can reason over. const transcript = allMessages .map((m) => m.data) .sort((a, b) => a.date - b.date) .map((m) => ({ role: m.from[0].email === AGENT_EMAIL ? "agent" : "contact", body: m.body, date: new Date(m.date * 1000).toISOString(), })); // Check lifecycle constraints before generating anything. if (conversation.turnCount >= conversation.maxTurns) { await escalate(conversation, "max turns reached"); return; } const replyBody = await llm.generateReply({ purpose: conversation.purpose, step: conversation.step, transcript, metadata: conversation.metadata, }); const sent = await nylas.messages.send({ identifier: AGENT_GRANT_ID, requestBody: { replyToMessageId: msg.id, to: [{ email: conversation.contactEmail, name: conversation.contactName }], subject: `Re: ${thread.data.subject}`, body: replyBody.text, }, }); await db.conversations.update(conversation.threadId, { step: replyBody.nextStep ?? "awaiting_reply", turnCount: conversation.turnCount + 1, lastActivityAt: new Date().toISOString(), metadata: { ...conversation.metadata, ...replyBody.metadata }, });}
The LLM receives the full transcript and the current workflow step. It returns the reply text and a nextStep value that advances the state machine. The state update runs after the send so the step only advances when the reply goes out.
Handle lifecycle events
Most conversations do not end cleanly. Three cases will definitely come up.
When the agent hits its turn limit, encounters a topic it cannot handle, or detects that the contact is frustrated, hand the thread to a human. Mark the step escalated, record the reason, and notify through whatever channel your ops team monitors.
When the conversation’s purpose is fulfilled, mark it done. Future messages on the same thread need to be handled as a reopened conversation, not as a continuation of a resolved one.
async function completeConversation(conversation) { await db.conversations.update(conversation.threadId, { step: "completed", lastActivityAt: new Date().toISOString(), });}
A contact might reply to a thread that went quiet a week ago. Resuming automatically after that much silence is usually the wrong call. Check elapsed time before continuing and escalate if the gap is too long.
const hoursSinceLastActivity = (Date.now() - new Date(conversation.lastActivityAt).getTime()) / 3600000;if (hoursSinceLastActivity > 168) { // Over a week of silence. Escalate instead of auto-replying. await escalate(conversation, "dormant thread reopened after 7+ days"); return;}
What to watch in production
If a contact sends two messages in quick succession, a 30 to 60 second delay before responding lets you treat them as one turn rather than generating two separate replies out of order.
For long threads, do not pass every message to the LLM in full. Summarize earlier messages and pass only the last three or four exchanges in full. This keeps token usage reasonable without losing the context that matters for the current step.
Deduplication and per-thread locking are not optional. Webhook redelivery and concurrent workers both show up in production. Without dedup the agent sends duplicate replies. Without per-thread locking, two workers can generate conflicting replies on the same thread at the same time. Prevent duplicate agent replies walks through both.
Watch for bounces, not just replies. The same webhook stream delivers message.send_success, message.send_failed, and message.bounce_detected for everything the agent sends. If a reply bounces partway through a conversation, treat it as a lifecycle event like any other: mark the thread failed or escalate, rather than leaving the agent waiting on a reply that can’t arrive.
What to build next
The support agent use case and the scheduling agent use case both apply this full loop to a specific workflow, with classification logic, escalation paths, and multi-day state management across a complete ticket or booking lifecycle.