The era of AI guessing games is over. At Google Cloud Builder Day Bengaluru, I put a messy warehouse shelf on the main screen. Standard multimodal models predicted tokens and guessed the inventory count. My agent didn't guess — it wrote Python, executed it in a sandbox, and counted exactly.
This is the complete architectural breakdown of that system. It combines three Google Cloud primitives — Gemini 3 Flash with Code Execution, AlloyDB ScaNN for vector search, and the open Agent2Agent (A2A) Protocol — into a workflow that is not just intelligent, but mathematically verifiable. This codelab was also featured as part of Code Vipassana Season 14, Google's hands-on builder series.
The Hallucination Problem in Visual Commerce
Imagine a warehouse shelf stacked with identical cardboard boxes. You snap a photo and send it to your AI system. The model processes the image as a static snapshot and returns a number.
The Core Flaw: A standard multimodal LLM doesn't count items — it predicts the most statistically likely token given the image. It is making an educated guess based on its training distribution, not performing a measurement.
In a supply chain, this distinction is catastrophic. If the AI says you have 12 boxes when you actually have 15, that error does not stay contained. It cascades into your ERP system, triggers false stockouts, generates unnecessary reorders, and costs real money. For Operational Leadership to approve an AI deployment, the system must provide mathematically verifiable outputs — not plausible estimates.
This is the core premise of Deterministic AI Engineering: we do not prompt the model to predict an answer. We prompt the model to write a program that computes the answer.
Architecture Breakdown: Three Primitives, One Autonomous Loop
The system is built around a Think → Act → Observe loop. Each component handles a distinct concern: vision, memory, and communication. Together, they form an autonomous supply chain that can detect an inventory shortage and place a supplier order without human intervention.

The Eyes: Gemini 3 Flash + Code Execution
The Vision Agent uses Gemini 3 Flash with two critical capabilities enabled: ThinkingConfig (set to LOW level for optimal cost-performance balance) and the code_execution tool.
The Key Insight: Instead of asking the model “How many boxes do you see?”, we ask it to write a Python script using OpenCV that physically measures and counts pixel regions. The model reasons about the approach, generates the code, and executes it in a secure serverless sandbox.
The result is tagged as “Code-Verified” — a hard, reproducible count backed by deterministic computer vision, not token prediction. This is the fundamental architectural shift from generative AI to agentic AI.
- ThinkingConfig LOW: Provides just enough reasoning budget to plan the OpenCV script structure without the 2–3x latency overhead of higher thinking levels. Reserve
HIGHfor genuinely complex multi-step reasoning. - Secure Sandbox: Code execution runs in an isolated Cloud Run environment. The model cannot read your filesystem, call external APIs, or persist state — safe by design.
- Deterministic Output: The same image always produces the same count. No stochastic variance. Fully auditable.
The Memory: AlloyDB ScaNN for Millisecond Vector Retrieval
Once the Vision Agent has a deterministic count, it needs to act on that information — specifically, to identify which supplier stocks the detected part. The Supplier Agent handles this using AlloyDB AI with ScaNN (Scalable Nearest Neighbors).
Why not standard HNSW? As your inventory grows to tens of millions of parts, HNSW indexes become too large for RAM. ScaNN uses Vector Quantization to compress the index, fitting it into the CPU's L2 cache. The result: up to 10x faster filtered search and a 3–4x smaller memory footprint, verified in official Google Cloud benchmarks.
The search is executed with a single SQL operator: <=> (cosine distance). The Vision Agent's text description of the identified item is embedded using Vertex AI text-embedding-005, then queried directly against the inventory table.
This allows the system to perform sub-second reasoning directly via SQL — no application-layer data extraction, no pagination, no Python-side similarity logic. The database does the work it was built to do.
The Handshake: The Agent2Agent (A2A) Protocol
The Vision Agent and Supplier Agent run on separate servers. They may be built on entirely different frameworks. The A2A Protocol — an open standard initiated by Google and now housed in the Linux Foundation — is what allows them to discover each other and collaborate without custom SDK integrations.
MCP (Model Context Protocol)
Standardizes how AI applications communicate with external tools, APIs, and databases. Think of it as the USB standard for AI tools.
A2A Protocol
Standardizes how AI agents collaborate with each other. Agents discover capabilities via agent_card.json, negotiate modalities, and delegate tasks without exposing internal state.
Each agent serves a /.well-known/agent-card.json file describing its name, skills, and endpoints. The Control Tower reads these cards at runtime — meaning you can add a third “Logistics Agent” tomorrow and the system discovers it automatically. Zero config changes. This is plug-and-play agent composition.
Why Technology Alone Isn't Enough: The Execution Layer
After the demo at Builder Day, the most important conversations happened off-stage. Talking with Richard Seroter (Chief Evangelist, Google Cloud) and Abirami Sukumaran (Staff Developer Advocate, Google Cloud) surfaced a pattern that appears across every enterprise AI deployment: the technology is ready, but the organizational layer isn't unblocked.
CEOs are mandating agentic AI adoption. Developers are eager to build. The actual bottleneck is the Engineering Director tier — the leaders who bear the operational risk of integrating autonomous systems into production infrastructure.
The Risk Calculus: When an autonomous agent transitions from “providing a summary” to “executing code that alters database states and triggers supplier orders,” the blast radius of a failure expands exponentially. If the system errors, it is not the CEO who faces the operational fallout — it is the director who approved the deployment.
This is precisely why deterministic architectures are not just a technical preference — they are a governance requirement. By replacing probabilistic token generation with verified code execution, and by enforcing strict capability boundaries through the A2A protocol, we give engineering directors the observability and safety guarantees they need to approve production deployments with confidence.
The Architecture as a Risk Management Tool
Gemini Code Execution produces a verifiable, auditable output. AlloyDB ScaNN performs similarity search entirely within the database layer. A2A agent cards explicitly declare what each agent can and cannot do.
Together, these three primitives give Operational Leadership the governance structure they need: transparency at every step of the autonomous loop.
Build With Us: Resources & Next Steps
Everything built for this demo is open source and fully documented. Here's how to get your hands on it.
Talk Resources: Codelab, Source Code & Slides
The full talk page includes the official Google Codelab, the open-source GitHub repository, and the slide deck covering the shift from probabilistic to deterministic AI, the Think-Act-Observe loop architecture, and the ScaNN vs. HNSW analysis.
View All ResourcesNo Fluff. No Slides. Just Code.
Code Vipassana — named after the Buddhist practice of insightful meditation — is a hands-on, instructor-led coding program initiated by Abirami Sukumaran within the Google Cloud ecosystem. Season 14 features this exact Autonomous Supply Chain codelab as a featured session.
Designed purely for practitioners and cloud enthusiasts: guided codelabs, live design discussions, and zero marketing fluff. This is how builders level up.
Conclusion: The Architecture of the Autonomous Economy
What we built at Builder Day is a blueprint. Three primitives, each solving a distinct problem in the deterministic AI stack:
- Gemini 3 Flash + Code Execution replaces guessing with verified measurement.
- AlloyDB ScaNN replaces slow, memory-hungry search with sub-second vector retrieval at scale.
- The A2A Protocol replaces brittle, hardcoded API integrations with plug-and-play agent composition.
The goal is not to build AI that impresses in demos. The goal is to build AI that Operational Leadership can deploy in production with full confidence — because every output is auditable, every agent boundary is declared, and every count is verified by code, not conjecture.
The code is open source. The codelab is live. Come build with us.
