Back to blog
May 1, 2026

Building Illustra — AI Image Generation with A2A Protocol

AICloud RunLangChainA2AArchitecture

Building a monorepo AI image generation app with LangChain, Gemini, and Stability AI — and the production bugs that taught me everything.

Building Illustra — AI Image Generation with A2A Protocol

The Idea

What if you could generate images by just chatting with an agent? Not a complex UI with sliders and parameters — just type what you want, and get it back.

I built Illustra as a two-service monorepo to experiment with Google's A2A (Agent-to-Agent) protocol — a JSON-RPC based communication layer that lets AI agents talk to each other in a standardized way. The goal: combine LLM prompt enhancement with Stability AI image generation, all orchestrated through clean API boundaries.

User → Illustra UI (Express + Tailwind)
         ↓ /api/generate
       Illustra Agent (LangChain + Gemini)
         ↓ tool call (prefix routing)
       Stability AI (default) or OpenAI GPT Image 2.0
         ↓ upload
       GCS Bucket → Public URL

Architecture

The system has two services, each independently deployable on Google Cloud Run:

Agent Service (`@illustra/agent`)

The brain. It runs a LangChain agent powered by Google Gemini that:

1. Receives A2A JSON-RPC requests at /a2a/invoke

2. Enhances the user's prompt using Gemini (turns "a cat" into something Stability AI can work with)

3. Calls a custom tool that hits the Stability AI API or OpenAI GPT Image 2.0 for image generation

4. Uploads the result to Google Cloud Storage

The agent supports provider prefix routing via message prefixes:

  • [openai] → OpenAI GPT Image 2.0 (default quality: low)
  • [openai][high] → OpenAI GPT Image 2.0 (high quality)
  • (no prefix) → Stability AI (default)
  • The parseInput() function extracts these prefixes before the prompt

    is enhanced by Gemini and routes to the correct tool. 5. Returns a structured A2UI response (not just text — typed UI components)

    The agent exposes an Agent Card at /.well-known/agent-card.json for A2A discovery — any compliant client can find and invoke it.

    UI Service (`@illustra/ui`)

    The face. An Express server with a Tailwind CSS single-page app that:

    1. Presents a clean text input to the user

    2. Proxies requests to the agent via the A2A protocol

    3. Parses A2UI responses and renders the image

    The UI is completely stateless — it doesn't know anything about image generation. It just speaks A2A.

    The Build

    Monorepo with Bun

    Both services live in a single repo with Bun workspaces. Shared Makefile for build and deploy targets. Biome for linting, commitlint for conventional commits.

    A2A Protocol

    The Agent-to-Agent protocol is essentially JSON-RPC 2.0 over HTTP:

    {
      "jsonrpc": "2.0",
      "id": 1,
      "method": "message/send",
      "params": {
        "message": {
          "role": "user",
          "parts": [{ "type": "text", "text": "a sunset over mountains" }]
        }
      }
    }

    The response comes back as A2UI — structured data that describes UI components:

    {
      "result": {
        "role": "assistant",
        "parts": [
          {
            "kind": "data",
            "data": {
              "type": "Image",
              "props": {
                "url": "https://storage.googleapis.com/illustra/images/1234567890.png",
                "alt": "a sunset over mountains"
              }
            }
          }
        ]
      }
    }

    Cloud Run Deployment

    Both services deploy via gcloud run deploy --source . — Google Cloud Build builds the Dockerfile automatically. Environment variables are injected via env.yaml files.

    The Difficulties

    This is where things got interesting. Everything worked perfectly locally. Production told a different story.

    The 500 Error Mystery

    After deploying both services, the UI returned a cryptic error:

    SyntaxError: Failed to parse JSON from response

    The agent worked fine when tested directly. The UI was sending the right request. But the response couldn't be parsed as JSON.

    Root cause: Three problems layered on top of each other:

    1. Missing env.yaml in Cloud Run builds — The .gcloudignore file was excluding env.yaml, so the agent deployed without API keys. It returned an HTML error page, not JSON.

    2. Port mismatch — Cloud Run defaults to port 8080. The UI was configured to listen on 3000. The deployment flags didn't match the runtime configuration.

    3. No response validation — The UI blindly called response.json() without checking if the response was actually JSON. When it got an HTML error page back, it crashed.

    The Fix

    // Before: blind JSON parse
    const data = await response.json();
    
    // After: check content-type first
    const contentType = response.headers.get("content-type") || "";
    if (!contentType.includes("application/json")) {
      const rawText = await response.text();
      console.error(`Non-JSON response (status ${response.status}):`, rawText);
      return res.status(response.status).json({
        error: `Agent returned non-JSON response`,
      });
    }
    const data = await response.json();

    Three changes total:

  • Removed env.yaml from .gcloudignore so it gets included in Cloud Run builds
  • Aligned UI's default port to 8080 (Cloud Run standard)
  • Added content-type validation before JSON parsing
  • The Lesson

    Always validate before parsing. Never assume a response is JSON just because you expect it to be. Log the raw response on failure — it's the difference between a 5-minute and a 5-hour debugging session.

    Results

    Here are some images generated by the live system:

    Prompt: a futuristic AI workspace with holographic displays and glowing neural networks
    Illustra UI
    Prompt: a cloud infrastructure diagram with connected servers and data flowing
    Cloud Infrastructure
    Prompt: a minimalist terminal screen with green text on black background showing AI code
    Terminal

    The system handles prompt enhancement, image generation, storage, and structured response delivery — all through clean API boundaries.

    OpenAI GPT Image 2.0 Comparison

    Illustra now supports OpenAI GPT Image 2.0 alongside Stability AI.

    Using the [openai] prefix, users can switch providers instantly

    without changing any UI code.

    Speed & Cost Comparison

    Attribute
    Stability AI (SDXL)
    OpenAI GPT Image 2.0 (low)
    ----------------
    -------------------
    ---------------------------
    Speed
    ~3-5s
    ~2-4s
    Text rendering
    Poor
    Excellent
    Photorealism
    Excellent
    Good
    Cost (1024×1024)
    $0.03 (Core)
    $0.006 (low)
    Best for
    Photos, art
    Diagrams, text-heavy images

    Image Comparison

    Prompt: a futuristic AI workspace with holographic displays and glowing neural networks
    OpenAI - AI Workspace
    Prompt: a cloud infrastructure diagram with connected servers and data flowing
    OpenAI - Cloud Infrastructure
    Prompt: a minimalist terminal screen with green text on black background showing AI code
    OpenAI - Terminal

    The OpenAI images were generated using [openai][low] prefix for

    quick, cost-effective generation ($0.006 per 1024×1024 image).

    What's Next

  • Streaming responses for real-time progress updates
  • Multiple image generation models ✅ (OpenAI GPT Image 2.0 now supported)
  • Image history and gallery
  • Multi-agent orchestration with different specializations

  • Illustra is open source: github.com/mnkrana/illustra