Distributed Systems

AlgoArena 3D: Benchmarking Polyglot Microservices on Google Cloud Run

A teaching tool for engineering teams: visualizing how interpreted vs compiled languages race in real-time on serverless infrastructure.

Dec 1, 202512 min readMohit Bhimrajka
Cloud RunC++JavaPythonWebSockets
Share this article:
Scroll to explore
Back to Blog

Algorithms are usually taught in a vacuum. We learn Big-O notation on a whiteboard, but we rarely visualize how an interpreted language (Python) races against a compiled one (C++) in a distributed cloud environment.

As engineering leaders and educators, we often have to make architectural trade-offs. Is the developer velocity of Python worth the runtime overhead? Does the JIT compilation of Java justify the cold-start latency?

To answer these questions empirically, I built AlgoArena 3D—a real-time, distributed simulation platform. It orchestrates a "race" between three distinct microservices to solve dynamic pathfinding problems.

This wasn't just a coding exercise; it was a challenge in cloud-native orchestration. Here is how I leveraged Google Cloud Run and WebSocket streaming to build a low-latency, polyglot mesh.

C++ Worker

Raw Speed

Java Worker

JIT Compilation

Python Orchestrator

The Glue

The Architectural Challenge: Stateful Streaming in a Stateless World

The core requirement of AlgoArena is real-time visualization. We aren't just calculating a path; we are streaming the exploration frontier (every node visited) to the browser at 60 FPS.

This creates a conflict:

The Need: Persistent, full-duplex communication (WebSockets) to stream state updates.

The Constraint: Modern cloud architectures (Serverless) are typically designed for short-lived, stateless HTTP requests.

The Solution: Cloud Run's WebSocket Support

Google Cloud Run (Gen 2) with robust support for long-lived WebSocket connections and session affinity.

I chose Google Cloud Run specifically because of its robust support for long-lived WebSocket connections and session affinity. Unlike standard FaaS (Function-as-a-Service) which often kill connections after short timeouts, Cloud Run allows us to treat containers as "servers" that can scale to zero when idle but maintain persistent streams during active simulations.

Hub-and-Spoke Topology

C++ Worker
Python Orchestrator
Java Worker
  1. The Orchestrator (Python FastAPI): Acts as the API Gateway. It terminates the user's WebSocket connection and manages the simulation state.
  2. The Workers (C++ Crow & Java Javalin): Stateless computational units running in separate Cloud Run services.

Cost Optimization: By isolating the heavy computation (C++/Java) from the orchestration logic (Python), we utilize Cloud Run's granular scaling. If users are only benchmarking C++, we don't pay for Java instances.

Deep Dive: The "Polyglot" Implementation

Building a system that speaks three languages requires strict contract definitions.

1. The C++ Worker (Raw Speed)

For the C++ microservice, I utilized the Crow framework.

  • Why? It's a microframework similar to Flask but for C++.
  • Optimization: We used std::priority_queue with custom memory management. The container image is built using a multi-stage Docker build, compiling the binary statically (-static-libstdc++) to create a microscopic footprint on Artifact Registry. This results in the fastest cold-start times on Cloud Run.

2. The Java Worker (JIT Compilation)

The Java service runs on Javalin.

The Insight: Benchmarking Java is tricky because of the JVM Warm-up. In AlgoArena, you can visibly see the "First Run" penalty vs. the "Subsequent Run" speedup as the HotSpot compiler optimizes the A* pathfinding loops.

3. The Python Orchestrator (The Glue)

The Python service uses asyncio to manage the "scatter-gather" pattern.

  • Pattern: When a user starts a race, the Python backend establishes asynchronous WebSocket links to the C++ and Java services internally. It acts as a message broker, receiving raw data from the workers, tagging it with metadata (e.g., agent_id: "cpp-worker"), and forwarding it to the frontend.

4. Observability: Proving the Benchmarks

To validate these benchmarks, I didn't rely on local console logs. I configured the Python services to emit Structured JSON Logs directly to Cloud Logging using pythonjsonlogger.

Why Structured Logs? Cloud Logging's Log Analytics allows SQL-like queries over log data. This let me run aggregation queries over thousands of simulation steps, proving empirically that the C++ worker maintained a P99 latency of under 5ms while Python hovered around 25ms.

The "Hybrid" Logic: Dynamic Replanning

A key feature of AlgoArena is the Hybrid Lab, where users can place obstacles while the agent is moving. This mimics real-world robotics (e.g., a self-driving car detecting a road closure).

To handle this, I implemented a State Machine within the Python Orchestrator:

Phase 1: Global Plan
Interrupt Detected
Phase 2: Dynamic Replan
  1. Phase 1 (Global Plan): The agent calculates a standard A* path.
  2. The Interrupt: Using Python's asyncio.wait, the service listens for user events (obstacle placement) non-blockingly.
  3. Phase 2 (Dynamic Replan): Upon detecting a conflict, the Orchestrator freezes the worker, snapshots the agent's current location, updates the map, and hot-swaps to a new solver instance to calculate the remaining path.

Resilient Systems: This architecture demonstrates how to build systems that accept user interruption without crashing the processing loop—a critical concept for any developer building interactive AI tools.

Lessons for Engineering Leaders

Building AlgoArena 3D reinforced three critical lessons for cloud architecture:

  • Select the Runtime for the Workload

    Python is excellent for orchestration (IO-bound), but C++ offers a 10x improvement for graph traversal (CPU-bound). A polyglot architecture allows you to optimize costs by offloading CPU-intensive tasks to efficient languages.

  • Serverless for State? Yes.

    Google Cloud Run challenges the notion that serverless is only for stateless CRUD apps. With proper configuration (Session Affinity, CPU Allocation=Always), it handles complex WebSocket topologies effortlessly.

  • Visualization as a Teaching Tool

    As educators, we shouldn't just look at logs. By visualizing the memory consumption (nodes expanded) in 3D, we make the abstract trade-offs of algorithm design concrete for junior engineers.

Why Google Cloud?

Cloud Run isn't just "serverless Docker"—it's a Knative-based platform that handles WebSockets differently than AWS Lambda. Understanding these internal mechanics is what separates cloud operators from cloud architects.

AlgoArena demonstrates that with the right architectural patterns, you can build stateful, real-time, polyglot systems on fully managed serverless infrastructure.

Mohit Bhimrajka

Mohit Bhimrajka

Forward Deployed AI Engineer at Supervity. I build AI demos that close deals, then architect the systems to deliver on them. Code > Slides.

Want to learn more?

Explore the code on GitHub or get in touch to discuss AI architecture and production systems.