A healthy CI pipeline has a simple social contract, if a build passes, the codebase is in a releasable state, and if it fails, someone should be able to trust the failure signal. A test quarantine process can protect that contract when a subset of tests becomes unstable, but only if it is designed as a temporary control, not a permanent escape hatch.

The hard part is not deciding whether to quarantine flaky tests. Most teams eventually need some form of test quarantine in CI. The hard part is building governance around it so the team can keep shipping without training itself to ignore red builds. That balance matters for QA leads, SDETs, DevOps engineers, and engineering managers, because once broken-build management becomes habitual, reliability erosion is usually slow, quiet, and expensive.

This guide explains how to design a CI test quarantine process that isolates instability, keeps signal high, and makes removal the default outcome. We will cover policy, workflow, ownership, metrics, tooling, and the failure modes that turn quarantine from a safety valve into a dumping ground.

What a CI test quarantine process is, and what it is not

A CI test quarantine process is a controlled way to keep unstable tests from blocking unrelated delivery work while the team investigates and repairs them. The main goal is to preserve the usefulness of the pipeline without letting one flaky test block dozens of engineers every day.

That sounds straightforward, but teams often blur three distinct states:

  1. A test that is truly broken because the product is broken.
  2. A test that is flaky because the test or environment is unstable.
  3. A test that is temporarily disabled because the team has chosen not to trust it right now.

These states should not be handled the same way. If you treat every failure as a flaky test, you may hide product regressions. If you treat every failure as a release blocker, you may create so much interruption that people start bypassing CI. A good quarantine process separates signal from noise while keeping both visible.

Quarantine is a control, not a conclusion. It says, “We are not confident enough to let this test gate every build right now,” not, “This test no longer matters.”

At a technical level, the process usually means moving tests into a designated quarantine bucket, changing how failures are reported, and attaching an explicit owner, expiry, and revalidation path.

For background on the underlying concepts, see continuous integration, test automation, and software testing.

Why teams need quarantine in the first place

The temptation is to think quarantine is a sign of failure. In practice, it is often a sign that the team has enough test coverage to expose stability issues that were previously hidden.

Common causes include:

  • Timing-sensitive UI tests that race async rendering or network calls.
  • Shared test data that is mutated by parallel jobs.
  • Environment drift, including browser or dependency version changes.
  • Integration tests that depend on external services with inconsistent latency.
  • Order-dependent tests that pass in isolation and fail in suite runs.
  • Non-deterministic assertions, such as checking timestamps, generated IDs, or eventually consistent state too early.

The worst response is to do nothing and accept that the main branch is often red. The second-worst response is to silence failures without any policy, reporting, or ownership. Both create a culture where people stop believing the pipeline.

A good quarantine process gives teams a practical middle path. It says, “We are not going to let this unstable signal block all development forever, but we are also not going to forget about it.”

Design principles for broken build management

Before you choose implementation details, define the operating principles that the workflow must satisfy.

1. Main branch health is still the priority

Quarantine should reduce false blocking, not lower standards. The main branch or trunk still needs a meaningful signal. If a quarantined test is part of a critical release gate, the team should know exactly what risk it is accepting.

2. Every quarantine needs an owner

If nobody owns the flaky test, nobody will remove it. Assign an accountable owner, usually the team that owns the feature, service, or test suite.

3. Quarantine must be visible

A quarantined test should be easier to find than a healthy one. Hidden exceptions become permanent by accident.

4. Quarantine must expire

Do not create indefinite exceptions. Every quarantined item should have an expiration date, review date, or a release milestone that forces reevaluation.

5. Reinstatement should be simple

If bringing a test back into the main gate is painful, the organization will delay it. Design the process so reinstating a test is low-friction once the underlying issue is fixed.

6. Separate product failures from test instability

Do not let quarantine become a place to hide real defects. If the test is pointing to an actual product issue, the issue should remain visible through defects, alerts, or tracked incidents.

A practical CI test quarantine process, end to end

A workable process needs a clear lifecycle. Teams that skip this part usually end up with ad hoc Slack messages and brittle manual lists.

Step 1: Detect instability with enough confidence

Do not quarantine a test after a single random failure unless the evidence is overwhelming. Instead, define a rule that combines frequency and recency.

Examples of useful triggers:

  • A test fails three times in the last 20 runs on the main branch.
  • A test fails intermittently across two different runners or agents.
  • A test is failing only in CI, but passes consistently in local reruns.
  • A test causes repeated pipeline aborts on unrelated commits.

The point is not to invent a mathematically perfect threshold. The point is to avoid overreacting to one-off environmental noise while still responding before the team loses trust.

Step 2: Triage the failure into a category

Before quarantine, classify the failure so the next action is obvious.

A simple classification model works well:

  • Product defect, the application behavior is wrong.
  • Test defect, the assertion, locator, setup, or data strategy is wrong.
  • Environment defect, the CI runner, network, browser, or external dependency is unstable.
  • Unknown, the team needs more data.

This classification can be stored in a ticket, test metadata, or a quarantine registry. The most important part is that it is searchable and reviewable.

Step 3: Move the test to a quarantined execution lane

There are several patterns, and the right one depends on your pipeline architecture.

Pattern A: Exclude quarantined tests from the blocking stage

The main test stage runs the stable suite only. Quarantined tests run in a separate non-blocking job.

This is the simplest option, and it is often the best starting point.

Pattern B: Run quarantined tests in the same pipeline, but do not fail the build on them

This preserves feedback while preventing a single flaky test from blocking merge. It works well when the team wants visibility without interruption.

Pattern C: Split by risk tier

Smoke tests remain fully blocking, critical path tests are blocking with strong ownership, and lower-value or unstable tests are non-blocking until repaired.

This is useful when you need governance that matches business risk.

Step 4: Make quarantine status explicit in reporting

Your pipeline output should show which failures are quarantined, which are new, and which are still blocking.

A useful CI summary might show:

  • Passed: 214
  • Failed, blocking: 1
  • Failed, quarantined: 3
  • Newly flaky: 1
  • Re-run required: 2

This visibility matters because if quarantined failures disappear into logs, the process becomes invisible and therefore unreliable.

Step 5: Create a repair ticket automatically

Every quarantine event should produce a ticket or work item, even if the fix is not immediate.

At minimum, capture:

  • Test name and suite
  • Failure signature
  • First detected timestamp
  • Recent failure frequency
  • Owning team
  • Quarantine reason
  • Expiration or review date

A ticket without an owner or review date is just bookkeeping. A ticket with those fields becomes an actionable queue.

Step 6: Revalidate on a fixed cadence

Quarantined tests should be rerun in a controlled environment on a schedule that gives useful signal. Common choices include nightly runs, pre-release runs, or post-merge canaries.

The main question is whether the test is still failing because of a real issue or because the environment is unstable. Revalidation should aim to answer that with repeatable evidence, not intuition.

How to keep quarantine from becoming permanent

Most teams know they should remove quarantined tests, but few make it easy enough to happen. The difference between a temporary workaround and a permanent policy is usually governance, not intention.

Use an expiry date, not just a status label

A quarantine label without a deadline becomes a shadow policy. Set a default timebox, such as one sprint, two weeks, or the next release train, depending on your cadence.

If the team needs to extend quarantine, require an explicit review. The review should answer three questions:

  • What evidence shows the test is still unstable?
  • What work has been done to fix the root cause?
  • What is the plan and due date for rejoining the blocking suite?

Limit the number of quarantined tests per area

If one team can quarantine endlessly, quarantine will become the path of least resistance. A soft cap is often enough to force discipline. You do not need a rigid hard limit in every organization, but you should know when the exception count is climbing.

Track age as aggressively as count

Ten quarantined tests added yesterday are less worrying than two tests that have sat untouched for 90 days. Age is a better indicator of normalization than raw count.

Escalate chronic offenders

If a test has been quarantined multiple times, the problem may not be the test itself. It may be an unstable architecture, poor test data design, or a product area that is too brittle for current automation. That is a strategic signal, not a cleanup task.

Implementation patterns that work in real CI systems

You can implement quarantine in almost any CI platform, but the mechanics matter.

Use tags or metadata to classify tests

A tag like quarantine, flaky, or nonblocking lets the pipeline filter execution without moving code around. This is easy to automate and easy to audit.

For example, in Playwright you might isolate flaky tests with a tag and run them separately:

import { test, expect } from '@playwright/test';
test('@quarantine user settings saves preferences', async ({ page }) => {
  await page.goto('/settings');
  await page.getByRole('button', { name: 'Save' }).click();
  await expect(page.getByText('Saved')).toBeVisible();
});

Then your blocking pipeline can exclude quarantined tests, while a parallel job still executes them for visibility.

Keep quarantine rules in code, not only in a spreadsheet

A spreadsheet is useful for reporting, but the enforcement should live close to the pipeline. If the process depends on manual memory, people will forget to update it.

A GitHub Actions example might separate stable and quarantined jobs:

name: ci

on: [push, pull_request]

jobs: stable-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: npm ci - run: npx playwright test –grep-invert @quarantine

quarantined-tests: runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v4 - run: npm ci - run: npx playwright test –grep @quarantine

The important detail is not the specific tool, but the enforcement model. The main job fails on real regressions, the quarantine job informs you without blocking merges.

Store quarantine state in a machine-readable source

A YAML file, test manifest, or issue tracker label can act as the source of truth.

Example manifest:

quarantined_tests:
  - id: checkout-save-card
    owner: payments-team
    reason: intermittent timeout in CI only
    expires_on: 2026-01-15
  - id: profile-avatar-upload
    owner: identity-team
    reason: flaky browser permission prompt
    expires_on: 2026-01-22

That structure makes it possible to build a nightly check that warns when a quarantine has expired.

Add a pipeline guard for expired quarantines

If quarantine can expire, the CI system should remind the team when that happens.

A simple guard script can fail a non-blocking notification job, send a Slack alert, or open a ticket when an expired item still exists. The point is to make overdue quarantine impossible to ignore.

Metrics that tell you whether quarantine is healthy

You do not need a dashboard with every possible metric. You need a few measures that answer whether quarantine is helping or masking problems.

1. Quarantined test count

Track total count by suite and by owning team. Sudden growth is usually a signal that the system is drifting.

2. Quarantine age distribution

Age is critical. A process with many old quarantines is probably normalizing instability.

3. Mean time to repair a quarantined test

How long does it take from quarantine to restoration? If this grows too long, the process is not temporary anymore.

4. Reintroduction success rate

How often does a test come back from quarantine and stay stable? If most reinstated tests fail again quickly, the underlying fix may be superficial.

5. Ratio of new quarantines to removed quarantines

A healthy program removes as many or more quarantines than it adds over time, assuming the suite is mature.

The best quarantine metric is not how many tests are hidden, it is how quickly the hidden tests become trustworthy again.

Common failure modes and how to avoid them

Failure mode: Quarantine becomes a dumping ground

This happens when teams use quarantine as a substitute for root-cause analysis.

Avoid it by requiring owner, reason, and expiry, plus periodic review by QA or engineering leadership.

Failure mode: Real regressions get ignored

If everything flaky is quarantined, there may be no strong signal left.

Avoid it by keeping critical-path tests blocking, even if that means fixing the most valuable tests first.

Failure mode: The same test is quarantined repeatedly

This often means the test is too granular, too brittle, or built on an unstable assumption.

Avoid it by stepping back and asking whether the automation model is appropriate. Sometimes a lower-level API test is more stable than a UI test for the same coverage goal.

Failure mode: Quarantine is manually managed in chat messages

That pattern does not scale.

Avoid it by storing quarantine state in the pipeline or version control, with automation around notifications and expiry.

Failure mode: The team argues about flakiness without data

A process without evidence devolves into opinion.

Avoid it by logging failure signatures, timestamps, environment details, and rerun results so triage can be based on patterns rather than memory.

A governance model that keeps teams aligned

A CI test quarantine process works best when responsibility is distributed clearly.

QA leads

Define the policy, review exception trends, and ensure quarantined tests are still being treated as defects in need of resolution.

SDETs

Own test architecture, stabilize brittle tests, and improve instrumentation so failures are diagnosable.

DevOps engineers

Keep the CI environment deterministic, reliable, and observable, especially runners, dependencies, containers, and external integrations.

Engineering managers

Protect time for maintenance work, review quarantine aging, and make sure product deadlines do not consume all remediation capacity.

A sustainable model needs explicit capacity for repair work. If all sprint planning assumes new feature delivery and no reliability maintenance, quarantine will fill up no matter how good the policy is.

A decision framework for quarantining a test

Before quarantining, ask these questions:

  1. Is the failure pattern repeatable enough to justify a quarantine?
  2. Is the test blocking unrelated work more often than it provides useful signal?
  3. Does the test cover a critical user flow or a low-risk edge case?
  4. Can the problem be fixed quickly, or does it require deeper investigation?
  5. Will quarantine make the pipeline more trustworthy, not less?

If the answer to question 5 is no, do not quarantine yet. Fix the root cause or change the test design first.

When not to quarantine

There are times when quarantine is the wrong move.

  • When the failure reveals a real production bug.
  • When the test is part of an essential release gate and cannot safely be downgraded.
  • When the team has no ownership model for remediation.
  • When the failure rate is so high that the suite design itself needs rethinking.
  • When the test can be stabilized in a short, targeted fix and quarantining would only delay the work.

A quarantine process should reduce noise, not create permission to delay engineering discipline.

A simple policy template you can adapt

If you need a starting point, use a policy like this:

  • Any test that fails intermittently in CI may be quarantined only after triage.
  • Every quarantine requires an owner, reason, and expiration date.
  • Quarantined tests run in a separate non-blocking job and remain visible in reports.
  • Expired quarantines trigger a review notification.
  • Reinstatement requires a clean rerun threshold agreed by the owning team.
  • Chronic quarantines must be reviewed by QA and engineering management.

This is intentionally simple. The policy only works if the team can follow it consistently.

Final takeaway

A strong CI test quarantine process protects delivery without teaching the team to live with broken builds. That requires more than marking a test as flaky, it requires a full operational model, including triage rules, visibility, ownership, expiry, and a clear path back to the blocking suite.

If you design quarantine as a temporary state with friction around permanence, it becomes a useful part of broken build management. If you design it as an easy way to make the red bar go away, it will eventually erode the meaning of every green build.

The goal is not to eliminate all unstable tests overnight. The goal is to make instability visible, contained, and expensive enough that fixing it remains the easier long-term choice.