From Custom Session State to LangGraph: A Fable Migration Story
Migrating from in-memory session state to LangGraph with Firestore checkpoint — the architecture problems, build challenges, production wins, and why your agent needs persistent state

The Problem
For months, Fable — our children's story-writing AI agent — ran on custom state in server memory. It worked in development. But production told a different story.
Every deploy lost active story sessions.
Users couldn't resume half-finished tales.
Debugging meant grepping through console logs.
No streaming = blank screens while waiting.
We needed a rewrite.
The Architecture: Before vs After
Before — In-Memory Sessions


The new flow uses LangGraph's StateGraph with multiple nodes. Each node transforms state, and after every turn, the state is checkpointed to Firestore. If the server restarts, we just fetch the latest checkpoint and resume.
Now we have:
The Implementation
1. State = Graph Annotation
Instead of a custom SessionState type, we defined FableState as a LangGraph annotation. Think of it as the "memory" that flows through the graph:
FableState = {
sessionId, userId,
phase, age, character, theme, setting,
fullStorySoFar, sentencesWritten, sentenceLimit, maxSentences,
uniquenessScore, isUnique,
lastResponse, audioUrl, postProcessResult
}2. Nodes = State Transformers
Each node is an async function that transforms state:
collectAge(userInput) → sets age, asks next question
collectCharacter(userInput) → sets character
generateStory(userInput) → adds to story, increments sentence count
checkUniqueness(fullStory) → queries Firestore, sets uniquenessScoreThe graph routes between nodes based on the phase field.
3. Checkpoints = Firestore Documents
Turn complete? Save a document:
checkpoints/
├── thread_id: "story-123"
├── checkpoint: { ... entire state ... }
└── created_at: timestampResume? Fetch the checkpoint and continue from where you left off.
4. RAG = Unique Stories
Every story turn generates an embedding and queries Firestore:
"dragon found magic sword"
↓ Gemini embedding
↓ Firestore findNearest
↓ Compare against 1000+ stories
↓ Too similar? Prompt to change5. Parallel Post-Processing
Story done? Five tasks run simultaneously — 2 seconds instead of 10:
Polish | Caption | Summary | ImagePrompt | Save
2s 2s 2s 2s 2s
────────────────────────Parallel──────────────────────
= 2 seconds totalThe Challenges
Import Path Hell
Node files in src/agent/nodes/ were importing ../utils/ when they should use ../../utils/.
Fix: One sed command:
sed -i '' 's|from "../utils/|from "../../utils/|g' *.tsMissing Export
LENGTH_CONFIG wasn't exported from prompts.ts.
Fix: Added the missing export.
Session Manager Remnants
Old server.ts still referenced deleted session_manager.ts.
Fix: Updated imports and endpoints to use FirestoreSaver.
Bun's @google-cloud/firestore Bug
Known Bun issue (#4746): bundling @google-cloud/firestore fails with export errors.
Fix: Use runtime require:
const Firestore = require("@google-cloud/firestore").Firestore;Test Runner Issues
Bun's cache corrupted — caused symlink bugs.
Fix: Clear cache and retry:
rm -rf node_modules/.bun
bun testKey Wins
Before → After
What Changed for Users
Before:
After:
Lessons Learned
1. Custom state seems easy until it isn't
Custom session state looks simple. Then you need persistence, streaming, checkpoints, debugging — and you're rewriting anyway.
2. Agents need visible state
If you can't see your agent's state, you can't debug it.
3. Persistence is non-negotiable
Users expect stories to survive refreshes, deploys, crashes.
4. Streaming is a feature
Users want to see progress, not blank screens.
5. Parallel wins
If tasks don't depend on each other, run them simultaneously.
The Results
The migration took 3 weeks. The build works, tests pass, and Fable is now production-ready.
38 files changed, 1417 insertions(+), 1385 deletions(-)_Check more open source repos at: github.com/mnkrana/_