Appearance
Simulator Pattern
When a stub isn't enough: building faithful test doubles for complex external dependencies
Related Concepts: Unit Testing | Acceptance Testing | Backend Integration Testing | Acceptance Testing Guidelines
What is a simulator?
A simulator is a test double that models the behavior of an external system — not just its response shapes, but its state transitions, interaction rules, and the realistic scenarios that emerge from sustained interaction. Where a stub returns canned data for a single call, a simulator faithfully reimplements enough of the external system's logic that your tests exercise realistic conditions without ever touching the real thing.
In the Meszaros taxonomy, a simulator is a fake — it contains real working logic that takes shortcuts unsuitable for production. But it occupies the far end of the fake spectrum. An in-memory repository is a fake that replaces a database with a HashMap. A simulator is a fake that replaces a complex external service with a purpose-built model of how that service actually behaves.
The distinction matters because simulators are not mocks. They do not verify interactions. They do not assert that your code called specific methods in a specific order. They produce realistic data, model realistic behavior, and enforce realistic constraints so that your system can be tested against conditions that closely mirror production — without the non-determinism, latency, and fragility of depending on a live external service.
When a stub is enough and when it isn't
Not every external dependency warrants a simulator. The decision turns on how deeply your system depends on the external service and how complex that dependency's behavior is.
Stub territory
A system that listens for a single pull_request.closed webhook to trigger a cleanup task does not need a simulator. The interaction is simple: one event type, one payload shape, one code path. A factory function that produces a well-typed fixture payload is the right tool. You might have two or three variations — success, malformed payload, missing fields — and that's the entire surface area. A stub gives you everything you need. Building a simulator here would be over-engineering your test infrastructure for a peripheral feature.
Similarly, if your system calls a third-party geocoding API to convert an address to coordinates during user registration, that's a single stateless call. A stub that returns a canned lat/long pair is perfectly adequate.
Simulator territory
Consider instead a system that deeply integrates with a payment processor. Charges have lifecycles — they're created, authorized, captured, partially refunded, fully refunded, or disputed. Disputes have their own state machine with evidence submission deadlines and resolution outcomes. Refunds interact with the original charge's state. Webhook notifications arrive asynchronously to inform your system of state transitions that happened on the processor's side. Your system's business logic depends on correctly handling all of these states and transitions.
Or consider a system whose primary function is reacting to a stream of webhook events from a CI/CD platform, where builds transition through queued, in-progress, and completed states, jobs within builds have parallel lifecycles, events can arrive out of order, and failed builds get retried.
In both cases the external dependency isn't peripheral — it's central to what the system does. The interactions aren't simple request-response pairs — they involve state machines, lifecycle transitions, and correlated sequences of data. The number of meaningful test scenarios is combinatorial. Stubbing individual payloads would mean hand-crafting dozens of carefully correlated fixtures across your test suite, each with IDs, timestamps, and state fields that must be internally consistent. That's fragile, error-prone, and scatters knowledge about the external system's behavior across individual test files instead of centralizing it.
The decision heuristic
Build a simulator when:
- The dependency is central to your system's primary function, not a peripheral integration. If the external service disappeared tomorrow, your system would be fundamentally broken, not just missing a feature.
- The interaction is stateful. The external system has its own state machine, and your system's behavior depends on correctly handling that state — whether you're driving transitions via API calls, reacting to inbound events, or both.
- The number of meaningful scenarios is large. You need to test not just the happy path but failure modes, partial completions, edge cases in the external system's lifecycle, and error conditions that only emerge from realistic interaction patterns.
- A significant portion of your test suite needs realistic versions of this dependency's behavior. If only two tests touch the external service, a stub is fine. If twenty tests across unit, integration, and E2E suites all need well-formed data from this dependency, that's simulator territory.
When the answers are "peripheral, stateless, few scenarios, few tests" — use a stub or factory. When they're "central, stateful, many scenarios, widely needed" — build a simulator.
Anatomy of a simulator
A well-built simulator typically has three layers, each serving a different level of abstraction. Not every simulator needs all three — the depth depends on how your test suite consumes the simulated behavior. But the layering pattern is consistent enough to be worth understanding as a blueprint.
Layer 1: Data factories
The foundation is a set of factory functions that produce single, well-typed data structures representing the external system's entities and interactions. Each factory generates one payload — an API response, a webhook event, a resource representation — with realistic data: random but valid IDs, properly formatted timestamps, correct enum values, consistent relationships between fields. Factories conform to the external system's actual types or schemas.
Factories accept optional overrides so tests can pin specific values when they matter, while letting the simulator fill in realistic defaults when they don't. Shared fixtures — a merchant account, a customer profile, an organization — ensure consistency across related data without requiring tests to manually wire these details together.
This layer is the simulator's vocabulary — the individual building blocks from which realistic scenarios are composed.
Layer 2: Scenario composers
The middle layer composes individual data structures into realistic scenarios that model how the external system actually behaves. This is where the simulator's real value emerges.
A scenario composer knows the external system's state machine. For a payment processor, it knows that a charge must be authorized before it can be captured, that a refund references a captured charge and adjusts its state, that a dispute can only be opened against a completed charge, and that each transition produces the right combination of API response data and webhook notifications. For an event-driven system, it knows the valid ordering of lifecycle events and the correlations between them.
This knowledge is encoded once, in the composer, rather than scattered across individual test files. Tests express what scenario they want — a successful charge-and-capture, a charge that gets disputed, a partial refund — and the composer produces the correct data.
The key design goal is composability. A good scenario composer lets tests assemble arbitrary situations from simple primitives. The specific mechanism (builder methods, fluent API, static factory methods) is an implementation choice. What matters is that new scenarios don't require new hand-crafted fixtures — they're assembled from existing building blocks.
Layer 3: Transport drivers
The outermost layer handles the mechanics of how simulated data reaches your system. The shape of this layer depends entirely on the nature of the integration.
For an event-driven integration (webhooks, message queues), the transport driver pushes data into your system — POSTing events over HTTP with proper authentication headers, publishing messages to a queue, or emitting events on a bus.
For an API-based integration where your system is the caller, the transport driver receives requests from your system and responds with simulator-generated data. This might be a fake HTTP server that intercepts outbound requests (using tools like MSW or a test HTTP server) and responds according to the simulator's state.
Some integrations involve both directions — your system calls the external API to initiate actions, and the external system sends webhooks to notify you of asynchronous outcomes. A simulator for this kind of integration may need both a request handler and an event pusher working together.
This layer is only needed for integration and E2E tests. Unit tests typically consume simulator-generated data directly without transport.
One simulator across the entire test pyramid
One of the most powerful properties of a well-layered simulator is that it serves every level of the test pyramid. The same simulator infrastructure — the same factories, the same scenario composers — gets consumed differently depending on the test context.
Unit tests use scenario composers to generate realistic data, then pass it directly to pure domain functions. No HTTP, no application framework, no database. The test is fast and focused: given this data representing a particular scenario, does the domain logic produce the correct result? The simulator provides realistic input; the test verifies behavior.
Integration tests feed simulator-generated data through use case handlers or service layers, with real (or in-memory) repositories, but without real external service calls. This validates that the application's internal wiring correctly processes realistic scenarios — that state is updated, side effects are triggered, and edge cases are handled.
E2E tests use the full simulator stack including the transport driver. This validates the complete pipeline — from receiving an inbound event or intercepting an outbound API call, through authentication and parsing, to processing and persistence.
The same scenario — say, a payment charge that gets disputed after capture — can be tested at all three levels. The domain logic test verifies the state calculation is correct. The integration test verifies the use case orchestrates correctly. The E2E test verifies the full system handles it end-to-end. Because they all share the same simulator, the data is consistent across levels.
This reuse is a concrete payoff of investing in a simulator rather than scattering ad-hoc fixtures across test files. The simulator becomes shared testing infrastructure that scales with the test suite.
How simulators relate to the broader testing philosophy
Simulators are a natural extension of the principles documented in our existing testing guides. They don't represent a new philosophy — they're what happens when you apply our existing philosophy to a specific class of problem.
Mocking is a smell, but faking boundaries is essential
Our unit testing guide establishes that excessive mocking signals a design problem. The solution is to isolate side effects and test real collaborators sociably. Simulators operate at the same boundary where our guides already endorse test doubles — the architectural seam where your system meets the external world. A simulator doesn't mock your internal collaborators. It replaces an external dependency you genuinely cannot use in tests with a faithful model of that dependency's behavior.
The key difference from a simple stub is fidelity. A stub provides canned data. A simulator models behavior. When the external dependency's behavior is simple, a stub's fidelity is sufficient. When the dependency's behavior is complex, stateful, and central to your system, you need the higher fidelity that a simulator provides.
The acceptance testing driver layer
Our acceptance testing guide defines a layered architecture — specification, driver, infrastructure — and explicitly names simulators as a component of the driver layer. A simulator is the driver layer's answer to external dependencies: it provides the same domain-specific abstraction that workflows, API clients, and page objects provide for other aspects of the system under test.
Integration testing's "real dependencies" principle
Our backend integration testing guide insists on real dependencies — real PostgreSQL, no database mocking. Simulators complement this principle for dependencies that can't be real in a test environment. You can run a real database in CI. You cannot run a real third-party payment processor or CI/CD platform. The simulator fills that gap without compromising the principle: use real dependencies wherever possible, and where it's not possible, use the most faithful replacement you can build.
Building and maintaining simulators
Start from real data
The best simulators are built by studying real interactions with the external system. Capture actual API responses, webhook deliveries, or message queue payloads and use them as the basis for your factories. This grounds your simulator in reality rather than assumptions about what the external system sends.
Type your factories against the external system's official types (or generate types from their OpenAPI/JSON Schema specs) so the compiler catches drift between your simulator and the real contract.
Evolve the simulator as the integration evolves
A simulator is living test infrastructure, not a write-once artifact. When you add a new feature that handles a new interaction pattern or response shape, extend the simulator to cover it. When the external system changes its API, update the simulator alongside your production code. The simulator should be treated with the same care as production code — because the entire test suite's reliability depends on its accuracy.
Keep it centralized
A simulator's value comes from being the single source of truth for "how does the external system behave?" If individual test files start hand-crafting their own payloads instead of using the simulator, the knowledge scatters and the benefits evaporate. Treat the simulator as a shared library that tests consume, not a pattern that tests reimplement.
Know when you don't need one
Resist the temptation to build a simulator for every external integration. Most integrations are simple enough that a factory function or a few stub payloads are sufficient. The investment in a simulator — modeling the state machine, maintaining it as the external system evolves — only pays off when the dependency is complex, central, and widely tested. For peripheral integrations with simple interactions, a stub is the right tool and a simulator is over-engineering.
Summary
A simulator is a high-fidelity fake that models the behavior of a complex external dependency — its state machines, interaction patterns, and realistic scenarios — so your test suite can exercise conditions that closely mirror production without depending on a live external service.
Build a simulator when the external dependency is central to your system's function, the interaction is stateful, the scenario space is large, and a significant portion of your test suite needs realistic versions of that dependency's behavior. Use stubs and factories for everything else.
A well-layered simulator — data factories, scenario composers, transport drivers — serves the entire test pyramid from unit tests to E2E tests, centralizes knowledge about the external system's behavior, and scales with the test suite by composing complex scenarios from simple primitives.