Integraton Tests – Reclaiming the Middle

Part 3 of “The Test Pyramid — Reimagined.” Start with the opener if you missed it.

There is one layer in your test suite that gives you more product confidence per line of test code than any other. Most teams barely write any of it.

That layer is integration tests, and most teams have so few of them because nobody quite knows where they belong. The unit camp thinks anything with a database is too coarse. The e2e camp thinks anything without a browser is too fake. The middle gets squeezed between them, and the layer that should be doing the most work in your suite ends up doing almost none.

This post is about reclaiming the middle.

What I mean by “integration test”

In the opener I gave you the one-line version: do not mock your own code; do mock external dependencies. Let me unpack that.

An integration test, in my model:

Does not mock your own code. This is the rule that defines the layer. The seams between your modules run for real — controllers actually call services, services actually call repositories, repositories actually hit a real database (in-memory or Testcontainers, your call). If you find yourself reaching for a mock of something you wrote yourself, you’re back at the unit layer.
Does mock anything you don’t own. Third-party APIs. Payment processors. Email providers. The flaky vendor SDK that times out at 3 AM. Anything that lives outside your codebase and outside your control gets stubbed at the boundary.
Has no UI. No browser. No Selenium. No Playwright. The integration layer exercises your application’s logic, not your application’s pixels.
Runs pre-deploy, on every PR. This is non-negotiable. If your “integration test” requires a deployed environment, it is not an integration test in this model. It’s something else. We’ll get to that.

The defining property is the boundary. Your code, run for real, integrated with itself. Their code, replaced with a stub. The line between “yours” and “theirs” is the line where mocking starts.

Why it’s the highest-leverage layer

Here’s the principle, before anything else:

You test at the integration layer what cannot be tested at the unit layer.

Integration tests are not a higher-cost re-run of your unit suite. They are not “the same tests, but with the database turned on.” They are a different layer with a different job, and the job is to catch the class of bugs that only exist when modules are wired together — bugs that, by definition, your unit tests cannot see, because your unit tests deliberately replaced the wiring with mocks.

A non-exhaustive list of those bugs:

The controller passes the wrong field to the service. The unit tests for both pass; the controller test mocked the service, the service test mocked the controller. The integration test, which runs both, fails immediately.
The repository expects camelCase; the database column is snake_case. Unit tests with mocked repositories don’t touch a real DB. The integration test does.
The new code path bypasses a side effect the old code path relied on. Each unit test is happy in isolation. The integration test, which exercises the orchestration, isn’t.
The transaction boundary is wrong. A retry, a partial failure, a rollback that doesn’t roll back what you thought it would. None of this exists at the unit layer.

Notice what’s not on this list: anything you could have tested without the wiring. If a piece of behavior can be exercised at the unit layer, it should be — unit tests are faster, more precise, and tell you which line broke. The integration layer’s job isn’t to repeat that work. The integration layer’s job is to validate the wiring itself, and the wiring’s emergent behavior, and nothing else.

This is the part most teams get wrong when they start writing integration tests. They take a unit test, add a database, call it integration, and feel productive. What they’ve actually built is a slow, less-precise version of a test they already had. The integration suite balloons, the unit suite atrophies, and the pyramid quietly inverts in the worst way: same coverage, twice the runtime, half the diagnostic value when a test fails.

Don’t do that. Each test belongs at exactly one layer — the lowest layer where the behavior it covers can actually be observed. Unit tests for what a single unit does; integration tests for what only the wiring can break.

What a clean integration test looks like

Same shape, two languages.

Java + Spring Boot + Testcontainers + Mockito — testing an OrderService that persists to Postgres and calls an external payment provider. The DB is real (in a container); the payment provider is mocked at the client boundary.

@SpringBootTest
@Testcontainers
class OrderServiceIntegrationTest {

    @Container
    static PostgreSQLContainer<?> postgres =
        new PostgreSQLContainer<>("postgres:16");

    @DynamicPropertySource
    static void registerDb(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", postgres::getJdbcUrl);
        registry.add("spring.datasource.username", postgres::getUsername);
        registry.add("spring.datasource.password", postgres::getPassword);
    }

    @Autowired OrderService orderService;
    @Autowired OrderRepository orderRepository;
    @MockBean PaymentClient paymentClient;

    @Test
    void placesAnOrderAndChargesThePaymentProvider() {
        when(paymentClient.charge(any(), eq(new BigDecimal("99.00"))))
            .thenReturn(PaymentResult.success("txn-123"));

        UUID orderId = orderService.placeOrder("user-42", "sku-88");

        Order saved = orderRepository.findById(orderId).orElseThrow();
        assertThat(saved.getStatus()).isEqualTo(OrderStatus.CONFIRMED);
        assertThat(saved.getPaymentReference()).isEqualTo("txn-123");

        verify(paymentClient).charge("user-42", new BigDecimal("99.00"));
    }
}

The whole orchestration runs for real: controller layer, service layer, repository layer, real Postgres, real transactions. Only the PaymentClient — the boundary to a system we don’t own — is stubbed. If the controller passes the wrong user ID to the service, this test fails. If the repository column mapping is wrong, this test fails. If the service forgets to charge the payment provider entirely, this test fails. None of those failures would show up at the unit layer.

Node + Express + Supertest + nock — testing the /orders endpoint of an Express app. Real Express, real route handlers, real DB (Postgres via Testcontainers); the external payment HTTP call is intercepted by nock.

import request from "supertest";
import nock from "nock";
import { app } from "../src/app";
import { db } from "../src/db";

describe("POST /orders", () => {
  beforeEach(async () => {
    await db.query("TRUNCATE orders CASCADE");
    nock.cleanAll();
  });

  it("places an order and charges the payment provider", async () => {
    nock("https://payments.example.com")
      .post("/charge", { userId: "user-42", amount: 99.00 })
      .reply(200, { transactionId: "txn-123" });

    const res = await request(app)
      .post("/orders")
      .send({ userId: "user-42", sku: "sku-88" })
      .expect(201);

    expect(res.body.status).toBe("CONFIRMED");
    expect(res.body.paymentReference).toBe("txn-123");

    const { rows } = await db.query(
      "SELECT status, payment_reference FROM orders WHERE id = $1",
      [res.body.id]
    );
    expect(rows[0].status).toBe("CONFIRMED");
    expect(rows[0].payment_reference).toBe("txn-123");
  });
});

Same shape. Real route handler. Real DB. External HTTP call stubbed at the network boundary. One test, four layers exercised, no infrastructure beyond a test container.

This is what the integration layer is supposed to look like. It’s the cheapest way to get this much coverage.

Where teams under-invest

If integration tests are this useful, why do most suites have so few of them? Three reasons.

Reason 1: the team has been burned by slow integration tests in the past. Somebody once wrote a test that spun up the entire app, hit fifteen endpoints, and took forty seconds. The team correctly hated that test. The team incorrectly concluded that all integration tests are like that. They aren’t. A well-scoped integration test runs in a couple of seconds. A whole suite of them runs in under a minute. If yours don’t, the problem is the tests, not the layer.

Reason 2: the SDETs are all working in Selenium and the devs are all working in unit tests, and the integration layer doesn’t have an obvious owner. This one is structural and it’s the most common. Whoever owns the integration layer has to think about both the application internals (dev-shaped knowledge) and the test strategy (SDET-shaped knowledge). Without explicit ownership, the layer falls between two stools.

Reason 3: nobody can agree on what an integration test actually is, and the team has spent enough standup minutes debating it to fill a small book. Half the team thinks “anything with a database is integration.” The other half thinks “anything that doesn’t go through the browser is unit.” A third faction shows up with terms like service test or component test and the conversation goes another fifteen minutes. None of this changes whether the test is good. None of it changes whether the test should ship. Pick a definition, write it down, move on — the dictionary isn’t going to save you, and the time you spend classifying the test is time you aren’t spending writing the next one.

The fix for all three is the same: explicit attention from a senior engineer or SDET who is willing to define the layer for the team, write the first batch of tests as exemplars, and defend the layer in PR review when somebody tries to fold it back into “just write more unit tests” or “we’ll catch it in Selenium.”

The trophy in the room

I owe Kent C. Dodds a paragraph. Two posts ago I said I’d take up the disagreement properly here, and the integration layer is where it lives.

Kent’s testing trophy — “write tests, not too many, mostly integration” — is the model that’s done the most to popularize integration testing in the last decade. He’s right that integration tests are excellent. He’s right that most teams under-invest in them. He’s right that one good integration test is often worth more than five mocked-up unit tests. I agree with all of that, and his work on Testing Library is genuinely one of the best things that’s happened to UI testing in the last ten years.

Where I part ways: the corollary. Kent’s argument is essentially “more integration, fewer unit.” Mine is “more integration, also more unit, because they catch different things and the unit layer also pressures the code.”

The specific place this matters: what Kent calls “integration” includes things I’d call unit tests. Rendering a tree of React components with Testing Library and asserting on the DOM is, by my definition, a unit test. No infrastructure, no boundary crossings beyond the component tree, milliseconds to run. Kent labels it integration because it integrates components. I label it unit because it integrates components that all live in the same module of code with no external dependencies. Different vocabulary, same underlying tests. Worth knowing about, because if you read the trophy as “do less of what you’re doing in JUnit,” you’ll skip a layer that’s still pulling weight.

The takeaway is not “Kent is wrong.” The takeaway is “if you’re writing your test strategy from the trophy, double-check that what you call a unit test isn’t what Kent calls an integration test, because the recommendation ‘fewer unit tests’ will mean very different things to the two of you.”

Use both layers. Aggressively.

Pre-deploy or bust

One last rule, and it’s a hard one:

Integration tests run pre-deploy. On every PR. Without exception.

If you find yourself building “integration tests” that require a deployed staging environment, a real downstream service, or any infrastructure that doesn’t exist on a developer’s laptop, you’ve built something else. Probably an API test or a system test, both of which are above the deployment line and have different cost-benefit math.

The whole reason integration tests are the highest-leverage layer is that they catch real wiring bugs before code ships. Move them above the line and you’ve kept the cost of integration tests while losing the benefit. Don’t.

What’s next

Up next: the deployment line itself. The most underdrawn line in modern testing diagrams, the thing that resolves the bulk of “what kind of test is this?” arguments before they start, and the structural reason this entire model holds together.

Subscribe, RSS, bookmark, whatever your preferred mechanism is. We’re halfway up the pyramid.

Discover more from Go Forth And Test

Subscribe to get the latest posts sent to your email.