<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Mayank Rana - Blog</title>
    <link>https://www.mayankrana.in/blog</link>
    <description>Technical deep-dives and insights on game development, Web3, AI/ML, and engineering</description>
    <language>en-us</language>
    <lastBuildDate>Sun, 03 May 2026 12:44:41 GMT</lastBuildDate>
    <atom:link href="https://www.mayankrana.in/feed.xml" rel="self" type="application/rss+xml" />
    <generator>Mayank Rana Portfolio</generator>
    <ttl>60</ttl>
    <image>
      <url>https://www.mayankrana.in/og-image.svg</url>
      <title>Mayank Rana - Blog</title>
      <link>https://www.mayankrana.in/blog</link>
    </image>

    <item>
      <title>Building Illustra — AI Image Generation with A2A Protocol</title>
      <link>https://www.mayankrana.in/blog/building-illustra</link>
      <guid isPermaLink="true">https://www.mayankrana.in/blog/building-illustra</guid>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
      <description>Building a monorepo AI image generation app with LangChain, Gemini, and Stability AI — and the production bugs that taught me everything.</description>
      <content:encoded><![CDATA[<p><img src="https://www.mayankrana.in/blog-demo.png" alt="Building Illustra — AI Image Generation with A2A Protocol" /></p><h2>The Idea</h2><p>What if you could generate images by just chatting with an agent? Not a complex UI with sliders and parameters — just type what you want, and get it back.</p><p>I built <strong>Illustra</strong> as a two-service monorepo to experiment with Google&apos;s <strong>A2A (Agent-to-Agent) protocol</strong> — a JSON-RPC based communication layer that lets AI agents talk to each other in a standardized way. The goal: combine LLM prompt enhancement with Stability AI image generation, all orchestrated through clean API boundaries.</p><pre><code class="language-">User → Illustra UI (Express + Tailwind)
         ↓ /api/generate
       Illustra Agent (LangChain + Gemini)
         ↓ tool call
       Stability AI (image generation)
         ↓ upload
       GCS Bucket → Public URL</code></pre><h2>Architecture</h2><p>The system has two services, each independently deployable on Google Cloud Run:</p><h3>Agent Service (<code>@illustra/agent</code>)</h3><p>The brain. It runs a LangChain agent powered by <strong>Google Gemini</strong> that:</p><ul><li>Receives A2A JSON-RPC requests at <code>/a2a/invoke</code></li><li>Enhances the user&apos;s prompt using Gemini (turns &quot;a cat&quot; into something Stability AI can work with)</li><li>Calls a custom tool that hits the <strong>Stability AI API</strong> for image generation</li><li>Uploads the result to <strong>Google Cloud Storage</strong></li><li>Returns a structured A2UI response (not just text — typed UI components)</li></ul><p>The agent exposes an <strong>Agent Card</strong> at <code>/.well-known/agent-card.json</code> for A2A discovery — any compliant client can find and invoke it.</p><h3>UI Service (<code>@illustra/ui</code>)</h3><p>The face. An Express server with a Tailwind CSS single-page app that:</p><ul><li>Presents a clean text input to the user</li><li>Proxies requests to the agent via the A2A protocol</li><li>Parses A2UI responses and renders the image</li></ul><p>The UI is completely stateless — it doesn&apos;t know anything about image generation. It just speaks A2A.</p><h2>The Build</h2><h3>Monorepo with Bun</h3><p>Both services live in a single repo with Bun workspaces. Shared <code>Makefile</code> for build and deploy targets. Biome for linting, commitlint for conventional commits.</p><h3>A2A Protocol</h3><p>The Agent-to-Agent protocol is essentially JSON-RPC 2.0 over HTTP:</p><pre><code class="language-json">{
  &quot;jsonrpc&quot;: &quot;2.0&quot;,
  &quot;id&quot;: 1,
  &quot;method&quot;: &quot;message/send&quot;,
  &quot;params&quot;: {
    &quot;message&quot;: {
      &quot;role&quot;: &quot;user&quot;,
      &quot;parts&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: &quot;a sunset over mountains&quot;}]
    }
  }
}</code></pre><p>The response comes back as A2UI — structured data that describes UI components:</p><pre><code class="language-json">{
  &quot;result&quot;: {
    &quot;role&quot;: &quot;assistant&quot;,
    &quot;parts&quot;: [{
      &quot;kind&quot;: &quot;data&quot;,
      &quot;data&quot;: {
        &quot;type&quot;: &quot;Image&quot;,
        &quot;props&quot;: {
          &quot;url&quot;: &quot;https://storage.googleapis.com/illustra/images/1234567890.png&quot;,
          &quot;alt&quot;: &quot;a sunset over mountains&quot;
        }
      }
    }]
  }
}</code></pre><h3>Cloud Run Deployment</h3><p>Both services deploy via <code>gcloud run deploy --source .</code> — Google Cloud Build builds the Dockerfile automatically. Environment variables are injected via <code>env.yaml</code> files.</p><h2>The Difficulties</h2><p>This is where things got interesting. Everything worked perfectly locally. Production told a different story.</p><h3>The 500 Error Mystery</h3><p>After deploying both services, the UI returned a cryptic error:</p><pre><code class="language-">SyntaxError: Failed to parse JSON from response</code></pre><p>The agent worked fine when tested directly. The UI was sending the right request. But the response couldn&apos;t be parsed as JSON.</p><p><strong>Root cause</strong>: Three problems layered on top of each other:</p><ul><li><strong>Missing env.yaml in Cloud Run builds</strong> — The <code>.gcloudignore</code> file was excluding <code>env.yaml</code>, so the agent deployed without API keys. It returned an HTML error page, not JSON.</li></ul><ul><li><strong>Port mismatch</strong> — Cloud Run defaults to port <code>8080</code>. The UI was configured to listen on <code>3000</code>. The deployment flags didn&apos;t match the runtime configuration.</li></ul><ul><li><strong>No response validation</strong> — The UI blindly called <code>response.json()</code> without checking if the response was actually JSON. When it got an HTML error page back, it crashed.</li></ul><h3>The Fix</h3><pre><code class="language-typescript">// Before: blind JSON parse
const data = await response.json();

// After: check content-type first
const contentType = response.headers.get(&quot;content-type&quot;) || &quot;&quot;;
if (!contentType.includes(&quot;application/json&quot;)) {
  const rawText = await response.text();
  console.error(`Non-JSON response (status ${response.status}):`, rawText);
  return res.status(response.status).json({
    error: `Agent returned non-JSON response`
  });
}
const data = await response.json();</code></pre><p>Three changes total:</p><ul><li>Removed <code>env.yaml</code> from <code>.gcloudignore</code> so it gets included in Cloud Run builds</li><li>Aligned UI&apos;s default port to <code>8080</code> (Cloud Run standard)</li><li>Added content-type validation before JSON parsing</li></ul><h3>The Lesson</h3><p><strong>Always validate before parsing.</strong> Never assume a response is JSON just because you expect it to be. Log the raw response on failure — it&apos;s the difference between a 5-minute and a 5-hour debugging session.</p><h2>Results</h2><p>Here are some images generated by the live system:</p><blockquote><strong>Prompt:</strong> <code>a futuristic AI workspace with holographic displays and glowing neural networks</code></blockquote><p><img src="https://storage.googleapis.com/illustra/images/1777783207516.png" alt="Illustra UI" /></p><blockquote><strong>Prompt:</strong> <code>a cloud infrastructure diagram with connected servers and data flowing</code></blockquote><p><img src="https://storage.googleapis.com/illustra/images/1777783228474.png" alt="Cloud Infrastructure" /></p><blockquote><strong>Prompt:</strong> <code>a minimalist terminal screen with green text on black background showing AI code</code></blockquote><p><img src="https://storage.googleapis.com/illustra/images/1777783226602.png" alt="Terminal" /></p><p>The system handles prompt enhancement, image generation, storage, and structured response delivery — all through clean API boundaries.</p><h2>What&apos;s Next</h2><ul><li>Streaming responses for real-time progress updates</li><li>Multiple image generation models (DALL-E, Midjourney API)</li><li>Image history and gallery</li><li>Multi-agent orchestration with different specializations</li></ul><p>*Illustra is open source: <a href="https://github.com/mnkrana/illustra">github.com/mnkrana/illustra</a>*</p>]]></content:encoded>
      <category>AI</category>
      <category>Cloud Run</category>
      <category>LangChain</category>
      <category>A2A</category>
      <category>Architecture</category>
    </item>
  </channel>
</rss>