Back to blog
May 6, 2026

From Custom Session State to LangGraph: A Fable Migration Story

LangGraphFirestoreState ManagementRAGAI AgentsA2AFirebase

Migrating from in-memory session state to LangGraph with Firestore checkpoint — the architecture problems, build challenges, production wins, and why your agent needs persistent state

From Custom Session State to LangGraph: A Fable Migration Story

The Problem

For months, Fable — our children's story-writing AI agent — ran on custom state in server memory. It worked in development. But production told a different story.

Every deploy lost active story sessions.

Users couldn't resume half-finished tales.

Debugging meant grepping through console logs.

No streaming = blank screens while waiting.

We needed a rewrite.


The Architecture: Before vs After

Before — In-Memory Sessions

Before
Architecture

The new flow uses LangGraph's StateGraph with multiple nodes. Each node transforms state, and after every turn, the state is checkpointed to Firestore. If the server restarts, we just fetch the latest checkpoint and resume.

Now we have:

  • ✅ Persistent checkpoints in Firestore
  • ✅ Token-level streaming
  • ✅ Full state visibility
  • ✅ Resumable stories after interruptions

  • The Implementation

    1. State = Graph Annotation

    Instead of a custom SessionState type, we defined FableState as a LangGraph annotation. Think of it as the "memory" that flows through the graph:

    FableState = {
      sessionId, userId,
      phase, age, character, theme, setting,
      fullStorySoFar, sentencesWritten, sentenceLimit, maxSentences,
      uniquenessScore, isUnique,
      lastResponse, audioUrl, postProcessResult
    }

    2. Nodes = State Transformers

    Each node is an async function that transforms state:

    collectAge(userInput) → sets age, asks next question
    collectCharacter(userInput) → sets character
    generateStory(userInput) → adds to story, increments sentence count
    checkUniqueness(fullStory) → queries Firestore, sets uniquenessScore

    The graph routes between nodes based on the phase field.

    3. Checkpoints = Firestore Documents

    Turn complete? Save a document:

    checkpoints/
    ├── thread_id: "story-123"
    ├── checkpoint: { ... entire state ... }
    └── created_at: timestamp

    Resume? Fetch the checkpoint and continue from where you left off.

    4. RAG = Unique Stories

    Every story turn generates an embedding and queries Firestore:

    "dragon found magic sword"
        ↓ Gemini embedding
        ↓ Firestore findNearest
        ↓ Compare against 1000+ stories
        ↓ Too similar? Prompt to change

    5. Parallel Post-Processing

    Story done? Five tasks run simultaneously — 2 seconds instead of 10:

    Polish | Caption | Summary | ImagePrompt | Save
      2s       2s        2s         2s          2s
      ────────────────────────Parallel──────────────────────
                        = 2 seconds total

    The Challenges

    Import Path Hell

    Node files in src/agent/nodes/ were importing ../utils/ when they should use ../../utils/.

    Fix: One sed command:

    sed -i '' 's|from "../utils/|from "../../utils/|g' *.ts

    Missing Export

    LENGTH_CONFIG wasn't exported from prompts.ts.

    Fix: Added the missing export.

    Session Manager Remnants

    Old server.ts still referenced deleted session_manager.ts.

    Fix: Updated imports and endpoints to use FirestoreSaver.

    Bun's @google-cloud/firestore Bug

    Known Bun issue (#4746): bundling @google-cloud/firestore fails with export errors.

    Fix: Use runtime require:

    const Firestore = require("@google-cloud/firestore").Firestore;

    Test Runner Issues

    Bun's cache corrupted — caused symlink bugs.

    Fix: Clear cache and retry:

    rm -rf node_modules/.bun
    bun test

    Key Wins

    Before → After

    Feature
    Before (Custom)
    After (LangGraph)
    ----------------
    ---------------------
    -----------------
    **Persistence**
    RAM (lost on restart)
    Firestore
    **Streaming**
    Turn-level
    Token-level
    **Debugging**
    console.log
    Query checkpoints
    **Post-process**
    10s sequential
    2s parallel
    **Uniqueness**
    None
    RAG query

    What Changed for Users

    Before:

  • Refresh the page = start over from the beginning
  • Deploy = lose all stories in progress
  • No idea what the agent was doing
  • After:

  • Refresh = resume seamlessly from where you left off
  • Deploy = stories survive, pick up exactly where they stopped
  • Full visibility into every checkpoint

  • Lessons Learned

    1. Custom state seems easy until it isn't

    Custom session state looks simple. Then you need persistence, streaming, checkpoints, debugging — and you're rewriting anyway.

    2. Agents need visible state

    If you can't see your agent's state, you can't debug it.

    3. Persistence is non-negotiable

    Users expect stories to survive refreshes, deploys, crashes.

    4. Streaming is a feature

    Users want to see progress, not blank screens.

    5. Parallel wins

    If tasks don't depend on each other, run them simultaneously.


    The Results

    The migration took 3 weeks. The build works, tests pass, and Fable is now production-ready.

    38 files changed, 1417 insertions(+), 1385 deletions(-)

    _Check more open source repos at: github.com/mnkrana/_