Skip to main content

Build Comparison - Compare Test Builds and Track Regressions


Overview

Build Comparison allows you to compare two builds side by side to instantly see what changed - which tests started failing, which got fixed, and which remain stable. Use it to validate releases, debug regressions, and track test stability.

Accessing Build Comparison

  1. Navigate to InsightsBuild Insights
  2. Click on any build to open the Build Details page
  3. Select the Compare tab

Selecting Builds to Compare

When you first open the Compare tab, you'll see an empty state prompting you to select a build for comparison.

Build Comparison - Empty State

Click Select build to compare to open the build selection dialog.

Build Selection Dialog

Build Comparison - Select Build Dialog

The dialog provides options to find builds:

OptionDescription
Past runs of same buildShows previous executions of the current build (default)
All BuildsShows all builds across your account for cross-build comparison
SearchSearch bar to find builds by name

Each build in the list displays:

  • Build Name - Full build identifier
  • Duration - Total execution time (e.g., 52m 53s)
  • Test Count - Number of tests executed
  • Execution Timestamp - Execution date and time
  • User - Associated username who executed the build (e.g., atxSmoke)
  • Results Summary - Quick pass/fail/other counts (🟢 passed, 🔴 failed, ⚫ other)

Select a build and click Compare Builds to run the comparison. The selected build becomes the Compare build, while the current build you navigated from becomes the Base build.

tip

For release validation, select your last stable production build as Base and the release candidate as Compare.


Key Comparison Metrics

Build Comparison - Key Metrics Summary
Understanding Failed Statuses

The following statuses are considered failed statuses: Failed, Error, Lambda Error, Idle Timeout, and Queue Timeout. Change detection is based on whether a test transitions to or from these statuses.

MetricDescriptionWhen to Act
New FailuresTests not failing in Base but failing in Compare (see details below)🚨 Investigate immediately before release - these are regressions
Pass RatePercentage of passed tests with delta (↗ or ↘) from Base.Set release gates (e.g., "Release only if >95%")
FixedTests that failed in Base but passed in Compare.Verify fixes are genuine, not flaky behavior
No ChangeTests with same non-passing status in both builds.Review for persistent infrastructure issues
Additional TestsNew tests in Compare not present in Base.Confirm new features have test coverage
Dropped TestsTests in Base but missing from Compare.⚠️ Investigate if not intentionally removed

Understanding New Failures

The New Failures metric includes two scenarios:

ScenarioDescriptionLabel in Table
RegressionTest existed in Base with a non-failed status but has a failed status in CompareNew Failure
New test failingTest did not exist in Base but has a failed status in CompareNew Failure (Additional)

Both scenarios are counted together in the New Failures metric shown in the summary cards and charts. In the Test Instances table, tests that didn't exist in Base are labeled as New Failure (Additional) to help you distinguish between regressions in existing tests versus failures in newly added tests.


Results Comparison Chart

Build Comparison - Results Comparison and Status Changes Charts

The horizontal bar chart compares test counts by status between builds:

  • Purple bar: Base build
  • Orange bar: Compare build

If the orange bar is longer for Failed/Error statuses, more tests are failing in the newer build.

Status Changes Chart

The donut chart categorizes tests by how their status changed:

CategoryDescriptionAction
New FailuresNon-failed → Failed (includes New Failure Additional)Prioritize - check recent code changes
Fixed InstancesFailed → PassedVerify fix is stable, not flaky
Stable InstancesPassed → PassedNo action - reliable tests ✓
Consistent FailuresFailed in both buildsTriage - document or fix before release

Test Instances Comparison Table

Build Comparison - Test Instances Comparison Table
ColumnDescriptionUse Case
Test InstancesTest name, spec file, platform, browserClick to view detailed logs and recordings
BaseStatus and duration in Base buildReference point for comparison
CompareStatus and duration in Compare buildIdentify status changes at a glance
Duration ChangeTime difference (+slower, -faster)Flag tests with >30% increase for performance review
Change TypeStable, Status Change, Fixed, New Failure (Additional), etc.Filter to focus on specific change categories

Filtering Options

FilterDescription
AllFilter by change type
SearchFind tests by name or spec file
OSFilter by operating system
BrowserFilter by browser type
Test TagsFilter by custom tags
tip

Use filters to isolate platform-specific issues. If failures only occur on a specific browser or OS, it helps prioritize the fix.


Common Use Cases

Pre-Release Validation

Compare your last stable build (Base) with the release candidate (Compare). Proceed only if New Failures = 0 and pass rate meets standards.

Debugging a Broken Build

Compare the last passing build (Base) with the failing build (Compare). Review New Failures and use filters to isolate platform-specific issues.

Measuring Stabilization Progress

Compare the sprint-start build (Base) with the latest build (Compare). Use Fixed count and reduced Consistent Failures to demonstrate progress.

Environment Comparison

Compare production build (Base) with staging build (Compare) to identify environment-specific failures.

Cross-Browser Compatibility

Compare Chrome build (Base) with Firefox/Safari builds (Compare) to catch browser-specific issues.


Best Practices

  1. Compare similar test suites - Comparing different test sets leads to misleading Additional/Dropped counts.
  2. Investigate New Failures immediately - These are potential regressions.
  3. Verify Fixed tests - Run them multiple times to confirm stability.
  4. Monitor Duration Changes - Increases >20-30% may indicate performance issues.
  5. Document Consistent Failures - Maintain a list of known, accepted failures.
  6. Establish comparison baselines - Define standard comparison points (last production release, previous nightly, sprint-start).

FAQ

Can I compare builds from different projects? Yes, but for meaningful results, compare builds with similar test suites.

Why are tests showing as "Dropped"? Tests may be skipped in configuration, failed to execute, or removed from the suite.

How is Pass Rate calculated? (Passed Tests / Total Tests) × 100. The delta shows the change from Base.

How far back can I compare? Any two builds within your data retention period.

Test across 3000+ combinations of browsers, real devices & OS.

Book Demo

Help and Support

Related Articles