How I Achieve Continuous Delivery With Confidence

I used to feel it on Thursdays. A branch had been open for four days, it had grown into an 800-line diff, and the only way to land it was to merge it and hope. The test suite was green, but the tests were all unit tests - isolated functions, mocked dependencies, no concept of the application as a whole. Nobody had run the actual flows end-to-end. Nobody had verified that the API endpoints still returned the right shapes after the refactor. Nobody had checked whether the shared component we changed had quietly broken a screen on the other side of the app.
That anxiety is not inevitable. It is a symptom of specific practices: large branches, deferred integration, tests that cover the parts but not the whole. Here is the CI loop I run instead, from the first commit to the merge button.
The Branch Discipline That Makes Everything Else Work
Before any tooling: branch size.
I keep branches open for one to two days at most. Each branch maps to exactly one shippable behaviour - one endpoint added, one component updated, one migration applied, one feature flag enabled. Not a week of work shaped into a single PR on a Friday morning.
This is a forcing function, not a restriction. When a branch has to close in 48 hours, there is no room to scope-creep it. The PR becomes reviewable in under fifteen minutes because the diff is genuinely small and the intent is obvious. Merge conflicts become rare because the branch does not live long enough to drift from main.
Every tool in this stack performs better on small branches. Cypress runs in under three minutes on a focused change. Chromatic surfaces two or three meaningful visual diffs rather than forty. A visual regression on a small diff is straightforward to reason about - you know exactly what you changed and you can see whether the visual delta matches your intent. On a week-old 800-line branch, the same diff is noise.
The branch discipline is not the cherry on top. It is the foundation the rest of this sits on.
Testing the Back: Cypress for Server Behaviour
Most people reach for Cypress as a browser testing tool. That is a fair use. But I get just as much value from it at the API layer - exercising the server directly before the browser is ever involved.
cy.request() runs HTTP calls straight against your running server. No browser. No DOM. Just the request lifecycle: middleware, auth guards, database writes, response serialisation.
// cypress/e2e/api/orders.cy.js
describe("Orders API", () => {
it("returns 401 for unauthenticated requests", () => {
cy.request({
method: "GET",
url: "/api/orders",
failOnStatusCode: false,
}).then((res) => {
expect(res.status).to.eq(401);
});
});
it("creates an order and returns the expected shape", () => {
cy.loginByApi(); // custom command - POST /api/auth/session with test credentials
cy.request("POST", "/api/orders", {
items: [{ productId: "prod_123", quantity: 2 }],
currency: "GBP",
}).then((res) => {
expect(res.status).to.eq(201);
expect(res.body).to.have.property("orderId");
expect(res.body.status).to.eq("pending");
});
});
});These tests catch a different class of failure: a changed response shape, a missing auth guard, a 200 where the contract says 201. They run fast - no browser startup, no paint events - and when one fails, the location of the problem is unambiguous. The issue is server-side and I know it before I run anything else.
The other reason I write API tests first is that they define the contract the browser tests work against. When I wire cy.intercept() into the E2E suite, the fixture I use is not a guess - it mirrors a shape I have already confirmed the real server returns.
// cypress/e2e/checkout.cy.js
describe("Checkout flow", () => {
beforeEach(() => {
cy.login();
cy.intercept("POST", "/api/orders", { fixture: "order-success.json" }).as(
"createOrder",
);
});
it("submits the order and shows confirmation", () => {
cy.visit("/cart");
cy.findByRole("button", { name: /proceed to checkout/i }).click();
cy.findByLabelText("Card number").type("4242424242424242");
cy.findByLabelText("Expiry").type("12/28");
cy.findByLabelText("CVC").type("123");
cy.findByRole("button", { name: /place order/i }).click();
cy.wait("@createOrder").its("request.body").should("deep.include", {
currency: "GBP",
});
cy.findByText("Order confirmed").should("be.visible");
cy.url().should("include", "/orders/");
});
});The split is deliberate: API tests confirm the contract, browser tests confirm the experience against a known-good contract. In CI I run the API suite first. If an endpoint is broken, there is no point running the full browser flow - the failure is faster to find and the fix location is obvious.
# .github/workflows/ci.yml
- name: Cypress API tests
run: npx cypress run --spec "cypress/e2e/api/**"
env:
CYPRESS_BASE_URL: http://localhost:3000
- name: Cypress E2E tests
run: ELECTRON_EXTRA_LAUNCH_ARGS=--remote-debugging-port=9222 npx cypress run --spec "cypress/e2e/**"
env:
CYPRESS_BASE_URL: http://localhost:3000Shipping Delight: Chromatic for Visual Confidence
Cypress tells me the application behaves correctly. It does not tell me the application looks correct. That gap is where the expensive regressions live - the shared <Button> with a different line-height after a CSS variable change, the card layout that starts overflowing once the product title is three words longer than the test fixture, the dark mode variant that shipped without anyone looking at it.
I fill that gap with Chromatic. It integrates directly into the Cypress run: while Cypress executes, Chromatic communicates with the browser over Chrome DevTools Protocol and captures a full archive of every page the tests visit - DOM, CSS, fonts, assets. Those archives are uploaded, rendered into snapshots, and pixel-diffed against the baseline from the last accepted build.
bun add --dev chromatic @chromatic-com/cypress// cypress/support/e2e.js
import "@chromatic-com/cypress/support";// cypress.config.js
const { defineConfig } = require("cypress");
const { installPlugin } = require("@chromatic-com/cypress");
module.exports = defineConfig({
e2e: {
setupNodeEvents(on, config) {
installPlugin(on, config);
},
},
});In CI, Chromatic processes the archives after the Cypress run completes:
- name: Chromatic visual review
run: npx chromatic --cypress -t=${{ secrets.CHROMATIC_PROJECT_TOKEN }} --exit-zero-on-changes=falseWhat this produces is a review queue that is specific and actionable. Not "something changed visually" - but "the checkout confirmation screen changed visually in this test, here is the before, here is the after, here are the exact pixels that shifted." I review it, decide whether the change is intentional, and accept or reject in one click.
The delight I want to highlight is the accept flow. When I ship a genuine UI improvement - tighter spacing, a better hover state, a more readable error message - Chromatic shows me the delta in the diff. I accept it. The baseline updates. That new state is locked in and any future regression from it will be caught automatically. It is a very different relationship with UI changes than landing a PR and hoping QA saw every screen.
The
fetch-depth: 0is mandatory in the checkout step. Chromatic uses git history to find the correct baseline for the current branch. A shallow clone causes it to compare against the wrong commit and produces meaningless diffs.
- uses: actions/checkout@v4
with:
fetch-depth: 0One More Reviewer That Never Sleeps
Cypress and Chromatic between them tell me the behaviour is correct and the pixels are right. There is still a gap: the code itself. Logic that works correctly in the tests but is brittle under edge cases. Error handling that is missing because the test fixture never triggers a 500. Patterns that technically function but diverge from the conventions the rest of the codebase follows.
I close that gap with an AI reviewer that fires automatically when a PR opens. It reads the diff, the surrounding context, and the test files, and posts a structured review before any human has touched it.
The prompt is deliberately scoped to what the rest of the CI stack cannot see:
# .github/workflows/ai-review.yml
name: AI PR Review
on:
pull_request:
types: [opened, synchronize, reopened, ready_for_review]
jobs:
ai-review:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
pull-requests: read
issues: read
steps:
- uses: actions/checkout@v4
with:
persist-credentials: false
- uses: anomalyco/opencode/github@latest
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
model: anthropic/claude-sonnet-4-20250514
use_github_token: true
prompt: |
Review this pull request.
- Flag missing error handling and unguarded edge cases.
- Check that any API changes stay consistent with the
Cypress test suite in cypress/e2e/api/.
- Highlight anything that could visually regress a
component Chromatic is already tracking.
- Note patterns that diverge from codebase conventions.
Keep comments specific and actionable.The prompt anchors the AI to the same contract the automated tests enforce. It is not a generic "find bugs" instruction - it is looking for the things that Cypress and Chromatic have blind spots on: logic paths the tests never hit, error states the fixtures never trigger, drift from the patterns established elsewhere in the codebase.
When a human reviewer opens the PR, the landscape has already been mapped. Cypress checks are green or flagged. Chromatic diffs are queued. The AI review is posted. What remains for the human is the one thing that genuinely requires human judgment: does this change do what it is meant to do, and does it belong in this codebase? On a small branch, that is a fifteen-minute conversation - not because people are cutting corners but because there is genuinely little uncertainty left to resolve.
The Loop, End to End
These practices compound because they are layered in the right order:
CI Pipeline — end to end
One behaviour · 1–2 days max · no scope creep
cy.request() · auth guards · response shapes · status codes
Real browser · full user flows · network interception
Full DOM + CSS snapshot · pixel diff vs. accepted baseline
Code quality · logic · patterns · contract regressions flagged
Narrow diff · visual diffs · AI annotation already there
All checks green · baselines accepted · boring on purpose
- Branch opens - scoped to one behaviour, one to two days of work
- Cypress API tests - confirm the server contract: auth guards, response shapes, status codes
- Cypress E2E tests - confirm user-facing flows against a known-good contract
- Chromatic - captures full DOM + CSS archives, diffs every screen against the accepted baseline
- AI reviews the PR - flags code quality, missing error handling, patterns, contract regressions
- Human reviews - narrow diff, visual diffs ready, AI annotation already posted
- Merge - all checks green, baselines accepted, nothing waiting to surprise production
The small branch is load-bearing in every step. A large branch makes step 4 noisy - forty visual diffs, most of them intentional but none of them obvious. It makes step 5 produce worse output - an 800-line diff gives the AI reviewer too much surface area and the annotations lose precision. It makes step 6 slow - a week-old diff is not reviewable in good faith in fifteen minutes. It makes step 7 risky - you are merging something you cannot fully hold in your head.
Small branches make the tooling useful, the diffs readable, and the reviews honest.
What It Feels Like on the Other Side
The anxiety I described at the start does not come from deploying - it comes from not knowing. Not knowing whether the API contract changed under a refactor. Not knowing whether the UI shifted somewhere you did not look. Not knowing whether there is a missing error handler waiting to surface in production. Not knowing whether the branch has grown large enough that something important got lost in the diff.
The loop above removes each of those unknowns in sequence: the API suite tells me the server is honest, the E2E suite tells me the flows work, Chromatic tells me the pixels match intent, the AI review surfaces anything the tests cannot see, and the branch size means I actually understand the surface area I am merging.
The result is that I can merge at the end of a working day without thinking about it again. That is the bar I work to - not just "CI is green" but "I understand exactly what is going in and I would not be surprised by anything that comes out."
Boring deploys are the best deploys.