May 22, 2026
When to Keep Tests in CI and When to Move Them Into a Dedicated QA Pipeline
Learn when to keep tests in CI vs dedicated QA pipeline, how to design release gating, isolate flaky tests, and build a QA workflow that scales.
Teams usually start with a simple rule: if a test is valuable, put it in CI. That works well at first, until the pipeline slows down, flaky tests block merges, and the suite becomes a mix of fast checks, expensive end-to-end flows, and legacy tests nobody trusts. At that point, the real question is not whether to automate more, but where each test belongs.
The practical decision is about keep tests in CI vs dedicated QA pipeline. CI is best for fast feedback and merge gating. A dedicated QA pipeline is better for broader validation, environment-specific checks, longer-running scenarios, and tests that would otherwise punish every developer on every commit. The challenge is designing that split without creating two disconnected quality systems.
This guide breaks down how to decide which checks should block merge, which should run asynchronously, and which should be isolated into a separate QA workflow. It is written for engineering managers, QA leads, DevOps engineers, and SDETs who need a pipeline design that is fast, trustworthy, and maintainable.
Start with the purpose of each pipeline
Before choosing where a test belongs, define what the pipeline is supposed to do.
CI exists to answer a narrow question
Continuous integration is meant to tell you, quickly and reliably, whether the change you just made is safe to merge into the main branch. The idea is simple, and it is the reason CI became foundational in modern software delivery, see continuous integration.
A CI pipeline should optimize for:
- Speed, so developers get feedback while the change is still fresh
- Determinism, so failures are meaningful
- Low false-positive rate, so blocked merges are rare and actionable
- Tight coupling to the commit or pull request, so results reflect the exact code under review
If a check cannot satisfy those goals, it may still be valuable, but it may not belong in the merge-blocking path.
QA pipeline exists to answer a broader question
A dedicated QA pipeline is less about immediate merge safety and more about validating system behavior in a richer context. It can run after merge, on a schedule, on a release candidate, or on demand. It can use fuller environments, longer-lived test data, and more expensive validation.
That broader scope is useful because software testing, in the general sense, is not just one kind of check. It includes unit tests, integration tests, contract tests, end-to-end tests, smoke tests, and exploratory support around them, as defined in software testing and test automation.
The key is not to force every test into the same gate.
A good pipeline is not the one with the most checks, it is the one where each check has a clear job.
A simple decision model for test placement
When teams ask whether a test should stay in CI or move to a separate QA pipeline, the best starting point is a set of concrete questions.
1. Does this test need to block merge?
If the answer is yes, the test probably belongs in CI, or at least in the part of CI that runs on every pull request.
Use CI for tests that protect:
- Syntax and compilation issues
- Broken unit-level logic
- Contract mismatches between modules
- Critical API behavior with deterministic setup
- Small numbers of high-value integration checks
2. Is the test fast enough to run often?
A test that takes 30 seconds is very different from one that takes 20 minutes. CI should usually stay under a time budget that keeps developer flow healthy. The exact number varies, but the principle is consistent, if feedback is slow, people start batching changes or ignoring failures.
3. Is the test stable across environments?
If a test depends on fragile timing, third-party services, shared data, or browsers that behave differently across workers, it is a candidate for isolation. That does not automatically mean removing it from CI forever, but it may belong in a separate pipeline until it is hardened.
4. Does the test require expensive infrastructure?
Cross-browser suites, mobile device farms, large datasets, and full-stack system tests can be expensive. If they run on every commit, they can consume disproportionate compute and human attention.
5. What is the consequence of missing this regression before merge?
Not every regression deserves the same urgency. A typo in a report export and a broken checkout flow do not justify the same gating policy. High-severity regressions should be represented in CI. Lower-severity but still important regressions can run in QA after merge.
What should stay in CI
CI should be reserved for tests that provide high signal with low cost and low flake rate.
Unit tests
Unit tests are usually the strongest fit for CI because they are fast, deterministic, and focused. They should verify business logic, edge cases, and branching behavior without relying on the network or external services.
Good CI unit tests:
- Run in seconds or a few minutes
- Fail for a clear reason
- Can be debugged from the code diff and stack trace alone
- Are resilient to parallel execution
Contract tests
If your architecture depends on stable boundaries between services or modules, contract tests often belong in CI. They are especially valuable when a team owns an API consumed by multiple services.
A contract failure should usually block merge because it means the code may break downstream consumers. If contract execution is expensive, split the suite so the critical contracts run in CI and the broader compatibility matrix runs in QA.
Narrow integration tests
Not all integration tests are too heavy for CI. A focused set that verifies critical interactions can be a good fit, especially when it uses local containers or ephemeral test environments.
Examples:
- Repository methods against a test database
- Service calls against a stubbed downstream API
- Authentication logic with an ephemeral identity provider fixture
The rule is not “integration tests belong in QA.” The rule is, only the integration tests with a strong signal-to-cost ratio should block merge.
Smoke checks for changed paths
If your build system supports selective execution, you can run smoke tests in CI that cover the code paths affected by the pull request. This is especially useful in monorepos or large services where a full suite would be too slow.
The important part is traceability. Developers should be able to explain why a given check ran and why it blocked the merge.
What usually belongs in a dedicated QA pipeline
A dedicated QA pipeline is the right place for checks that are valuable but not suited to every commit.
Full end-to-end flows
End-to-end tests are useful, but they are often the first thing to overwhelm CI. They depend on multiple services, realistic data, user interface timing, and environment health. That makes them more likely to fail for reasons unrelated to the code change.
If you run E2E flows in CI, keep the set tiny and focused on the most business-critical journeys. Put the rest into QA where they can run after merge or against release candidates.
Cross-browser and device matrix tests
Browser compatibility suites are typically too broad for merge gating. They are valuable for release confidence, but they are usually better suited to a dedicated QA workflow that runs on a schedule or before deployment.
Long-running regression suites
Large regression suites are among the best candidates for QA. They protect accumulated product behavior, but their runtime and maintenance burden often make them impractical for every PR.
The danger is letting these suites drift into a “nice to have” category. If the suite is important enough to maintain, it should have a defined trigger and a defined owner.
Tests against shared or production-like environments
Some checks must run in a more realistic environment to be meaningful, such as those that validate feature flags, third-party integrations, background jobs, or data migration behavior. These are often better off in QA because they need environment orchestration that would be too heavy for CI.
Exploratory support and pre-release validation
A QA pipeline can also support manual investigation, scripted sanity checks, and validation around risky changes. This is where automation and human review complement each other.
A practical release gating strategy
The strongest pipeline designs use multiple layers of confidence instead of one giant gate.
Level 1, fast merge gate in CI
This is the minimum set that must pass before code merges.
Typical contents:
- Linting and formatting
- Unit tests
- A small number of critical contract or integration tests
- Lightweight security or dependency checks where appropriate
Level 2, post-merge QA validation
After merge, run broader checks that do not need to block the commit.
Typical contents:
- End-to-end smoke tests
- Broader regression suites
- Cross-browser tests
- API workflows across multiple environments
Level 3, release candidate or pre-deploy gate
Before shipping to production, run the most expensive or environment-sensitive tests.
Typical contents:
- Full regression against a release candidate
- Migration verification
- Performance sanity checks
- Production-like environment validation
This layered model keeps merge feedback fast while preserving stronger release confidence.
How to decide when a test should move out of CI
If a test currently blocks merge but causes pain, do not remove it reflexively. Use a structured review.
Move a test out of CI if it has one or more of these traits
- It is slow enough to materially affect developer throughput
- It flakes often enough to create alert fatigue
- It depends on unstable external systems
- It requires large data fixtures or environment setup
- It verifies a broad scenario that is not essential for every commit
- It duplicates coverage already provided by faster, lower-level tests
Keep it in CI if it has one or more of these traits
- It catches a high-severity defect class quickly
- It is deterministic and easy to debug
- It runs in a small, predictable amount of time
- It covers logic that no other test layer covers well
- It protects a critical path that must not regress unnoticed
A useful question is: If this test failed on a Friday afternoon, would I want every developer blocked until it is fixed? If the answer is yes, keep it in CI. If the answer is no, consider QA.
Flaky test isolation is not the same as test removal
One of the most common mistakes in test pipeline design is treating flaky tests as disposable. They are not disposable, they are misclassified.
Why flaky tests are dangerous in CI
A flaky test in CI hurts the most important thing a CI system provides, trust. Once developers assume failures are random, they stop reacting quickly, and the gate loses its value.
Better handling options
- Quarantine the test temporarily
- Remove it from merge blocking
- Run it in QA or a separate flaky-test job
- Track it with an owner and a fix SLA
- Stabilize the root cause
- Replace sleeps with explicit waits
- Remove time-sensitive assumptions
- Isolate test data
- Stub unstable dependencies
- Reclassify the test permanently
- If the test is inherently broad or environment-heavy, it may belong in QA rather than CI
Here is an example of a small Playwright check that is appropriate for CI because it is narrow and deterministic:
import { test, expect } from '@playwright/test';
test('checkout button opens the cart drawer', async ({ page }) => {
await page.goto('/products/sku-123');
await page.getByRole('button', { name: 'Add to cart' }).click();
await expect(page.getByRole('dialog', { name: 'Cart' })).toBeVisible();
});
Compare that with a broader multi-step flow that may be better suited to QA if it depends on seeded data, external payment simulation, and multiple redirects. The issue is not that the broader test is bad. It is that it may be too expensive or unstable to justify merge blocking.
Designing a QA workflow that does not become a junk drawer
A dedicated QA pipeline can fail in a different way, by turning into a place where slow or broken tests are dumped and forgotten. Avoid that by defining structure.
Give QA jobs a clear taxonomy
For example:
qa-smoke, runs after merge on every main-branch updateqa-regression, runs nightly or on demandqa-release-candidate, runs before deploymentqa-flaky-watch, runs quarantined tests with reporting
Make execution triggers explicit
A QA pipeline should have a known trigger, such as:
- Merge to main
- Nightly schedule
- Release tag
- Manual approval
- Deploy candidate build
Keep ownership visible
Every test suite should have an owner. If nobody owns the suite, nobody will fix failures or prune stale coverage. Ownership matters even more once tests are moved out of CI because the pain is less visible day to day.
Preserve signal in reporting
A QA pipeline should not just produce a green or red badge. It should show:
- Which tests ran
- What environment they used
- How long they took
- Which failures are new versus known
- Whether a failure blocks release or is informational only
Example pipeline split for a web application
Here is a realistic way to split work for a team shipping a web app with an API and browser front end.
CI on pull request
- Lint, typecheck, and build
- Unit tests
- API contract tests
- One or two critical UI checks
- Static analysis and dependency scan
QA after merge to main
- Full browser smoke suite
- Auth and session flow validation
- Payment or checkout path validation in a test environment
- Integration checks with stubbed downstream services replaced by real sandbox services
QA nightly
- Full regression suite
- Cross-browser matrix
- Data migration tests
- Quarantined flaky tests, to measure whether they have become stable enough to reintegrate
Release candidate pipeline
- Full regression subset against the candidate build
- Environment-specific config validation
- Rollback verification if applicable
- Manual exploratory sign-off for high-risk areas
This split is often easier to manage than trying to make CI absorb everything. It gives developers quick feedback without losing broader coverage.
Common anti-patterns to avoid
Putting everything in CI because “more coverage is better”
Coverage only helps if the pipeline remains usable. If CI takes too long or fails too often, developers start bypassing it mentally or procedurally.
Moving too much into QA and letting CI become trivial
The opposite problem is also common. If CI only checks formatting and unit tests while serious regressions are left to QA, the merge gate becomes too weak and defects accumulate downstream.
Running the same checks in multiple places without a reason
Duplication is not inherently bad, but it should be intentional. For example, a critical smoke test may run in both CI and QA, but one version may be a narrow PR gate and the other a broader post-merge validation.
Ignoring environment drift
A dedicated QA pipeline often uses different infrastructure than CI. If those environments drift, test results become hard to compare. Define versions, seed data, feature flags, and credentials carefully.
Treating test ownership as secondary work
Once a suite leaves CI, it is easy to assume it is “someone else’s problem.” Build ownership and review of flaky tests into the QA workflow.
A decision table you can actually use
| Test characteristic | Keep in CI | Move to QA pipeline |
|---|---|---|
| Runs in seconds | Yes | Sometimes, if broad |
| Fails deterministically | Yes | Yes |
| Depends on multiple services | Maybe, if tightly scoped | Usually |
| Requires browser/device matrix | Rarely | Usually |
| Blocks a high-severity regression | Yes | Sometimes |
| Flaky under normal load | Not until fixed | Yes, temporarily or permanently |
| Needs expensive infrastructure | Rarely | Usually |
| Validates release readiness | Sometimes | Yes |
A good rule of thumb for team discussions
If a test is fast, stable, and directly tied to the change under review, keep it in CI. If a test is broader, slower, environment-sensitive, or mainly valuable for release confidence, move it to a dedicated QA pipeline. If a test is flaky, isolate it first, then decide whether to harden it or reclassify it.
That simple rule is not enough by itself, but it prevents the two most common failures: overloading CI and underfunding QA.
Implementation advice for different team sizes
Small teams
Small teams often need a lean split because they do not have the staffing to maintain a large multi-stage system. Start with:
- Fast unit and contract tests in CI
- One critical smoke test in CI
- A nightly QA suite for broader coverage
Do not create a separate QA pipeline just because it sounds more mature. Create it when CI can no longer stay fast and trustworthy.
Mid-size teams
As ownership expands, pipeline structure matters more. Introduce clear suite boundaries, test tagging, and per-stage service-level expectations. This is usually the point where quarantining flaky tests becomes necessary.
Large teams
Large organizations need pipeline governance as much as automation. Without standards, every team defines “CI” differently. Create shared rules for:
- Maximum CI duration
- What qualifies as merge blocking
- How flaky tests are triaged
- Which environments QA may use
- What requires release-candidate validation
Final checklist for deciding test placement
Before you add a test to CI, ask:
- Does this test need to block merge?
- Is it fast enough to run often?
- Is it deterministic and debuggable?
- Does it protect a critical regression class?
- Can it run without expensive or fragile dependencies?
Before you move a test to QA, ask:
- Does it still provide value if it runs after merge?
- Does it require a broader environment or dataset?
- Is the CI delay cost greater than the merge-blocking value?
- Will someone own the suite and act on failures?
Conclusion
The choice between CI and a dedicated QA pipeline is not about where to dump “less important” tests. It is about matching each test to the feedback loop it serves best.
Keep the fast, stable, high-signal checks in CI, where they can protect merge quality without slowing the team down. Move broad, expensive, or environment-heavy checks into a dedicated QA pipeline, where they can still provide real assurance without turning every pull request into a waiting game. If you handle flaky test isolation deliberately, define release gating clearly, and keep ownership visible, your test pipeline design becomes easier to maintain and far more useful.
The result is a QA workflow that supports both developer speed and release confidence, instead of forcing you to choose one at the expense of the other.