Why two passes (and why some findings disappear)

ADO Pilot reviews in two passes — a high-recall first sweep, then a critical re-check that rescinds 30 to 50 percent of Pass 1 findings by design.

Last updated 2026-05-05

ADO Pilot reviews each pull request in two passes. The first pass casts a wide net to catch every issue it can. The second pass re-reads each finding in context and discards the ones that do not hold up. Roughly 30 to 50 percent of Pass 1 findings are rescinded before they ever reach your PR — and that rescission rate is the design, not a defect.

Pass 1 — high-recall sweep

Pass 1 is tuned for recall. Its job is to surface anything that might be a problem so Pass 2 has a complete candidate set to work with.

The reviewer reads the entire diff plus enrichment context (Semgrep findings, tree-sitter syntax summaries, linked file context).
It produces an internal list of candidate findings, each with a location, a guessed severity, a category, and an evidence chain that explains the reasoning.
Pass 1 deliberately tolerates false positives. A candidate finding only needs to clear the bar of "this could plausibly be wrong" — not "this is definitely wrong."
None of these candidates are posted to your PR yet. They live only inside the review pipeline.

While Pass 1 is running, the tracking comment on your PR shows the in-progress placeholder — analyzing N files, expected completion in roughly M minutes — with no finding bullets. That blank state is intentional. Posting Pass 1 findings directly would mean roughly one in three would later be retracted, which is exactly the noise the two-pass design is built to avoid.

Pass 2 — critical re-check

Pass 2 takes the Pass 1 candidate set and re-evaluates each one with the full picture in view.

It rescinds false positives and findings that are technically true but too trivial to mention. ("This isn't actually a bug because the caller already validates the input.")
It confirms the findings that hold up and writes the user-facing comment text — the one-liner, the evidence prose, and the optional code suggestion.
It adjusts severity when Pass 1 over-classified — for example, downgrading a Pass 1 "blocker" to a "suggestion" once Pass 2 sees that the surrounding code already handles the case.

Only the confirmed, refined findings post to your PR. Everything else is dropped silently.

Why the rescission rate is a feature

A reviewer that is conservative on the first pass — "only flag what you are sure about" — misses real bugs and feels timid. A reviewer that is aggressive on the first pass and posts everything floods you with false alarms and trains you to ignore it. Two passes get both:

Pass 1 is aggressive, so genuine defects are never silently dropped.
Pass 2 is critical, so the noise floor on your PR stays low.
Both passes share cached system instructions and diff context, so the second pass is cheap.

The result is the rescission rate you should expect to see in your reviews: between roughly 30 and 50 percent of internal candidates are dropped before reaching the PR. If you wired a tool to log Pass 1 candidates and compare against the final review, you would see that gap. From the PR author's perspective, it shows up as a clean, scannable set of findings — never as a "thinking out loud" stream.

Timeline

push
  -> "review queued" comment posts
  -> Pass 1 runs (high-recall sweep, internal candidates only)
  -> "review in progress" comment (still no findings shown)
  -> Pass 2 runs (rescind, refine, write user-facing text)
  -> tracking comment finalizes to PASS, ADVISORY, or FAIL
  -> inline comments post for each confirmed finding
  -> ai-pr-review status check updates

The whole sequence usually runs in 2 to 5 minutes. See How long does a review take? for the full timing breakdown.