Why two passes (and why some findings disappear)

ADO Pilot reviews in two passes — a high-recall first sweep, then a critical re-check that rescinds 30 to 50 percent of Pass 1 findings by design.

Last updated

ADO Pilot reviews each pull request in two passes. The first pass casts a wide net to catch every issue it can. The second pass re-reads each finding in context and discards the ones that do not hold up. Roughly 30 to 50 percent of Pass 1 findings are rescinded before they ever reach your PR — and that rescission rate is the design, not a defect.

Pass 1 — high-recall sweep

Pass 1 is tuned for recall. Its job is to surface anything that might be a problem so Pass 2 has a complete candidate set to work with.

  • The reviewer reads the entire diff plus enrichment context (Semgrep findings, tree-sitter syntax summaries, linked file context).
  • It produces an internal list of candidate findings, each with a location, a guessed severity, a category, and an evidence chain that explains the reasoning.
  • Pass 1 deliberately tolerates false positives. A candidate finding only needs to clear the bar of "this could plausibly be wrong" — not "this is definitely wrong."
  • None of these candidates are posted to your PR yet. They live only inside the review pipeline.

While Pass 1 is running, the tracking comment on your PR shows the in-progress placeholder — analyzing N files, expected completion in roughly M minutes — with no finding bullets. That blank state is intentional. Posting Pass 1 findings directly would mean roughly one in three would later be retracted, which is exactly the noise the two-pass design is built to avoid.

Pass 2 — critical re-check

Pass 2 takes the Pass 1 candidate set and re-evaluates each one with the full picture in view.

  • It rescinds false positives and findings that are technically true but too trivial to mention. ("This isn't actually a bug because the caller already validates the input.")
  • It confirms the findings that hold up and writes the user-facing comment text — the one-liner, the evidence prose, and the optional code suggestion.
  • It adjusts severity when Pass 1 over-classified — for example, downgrading a Pass 1 "blocker" to a "suggestion" once Pass 2 sees that the surrounding code already handles the case.

Only the confirmed, refined findings post to your PR. Everything else is dropped silently.

Why the rescission rate is a feature

A reviewer that is conservative on the first pass — "only flag what you are sure about" — misses real bugs and feels timid. A reviewer that is aggressive on the first pass and posts everything floods you with false alarms and trains you to ignore it. Two passes get both:

  • Pass 1 is aggressive, so genuine defects are never silently dropped.
  • Pass 2 is critical, so the noise floor on your PR stays low.
  • Both passes share cached system instructions and diff context, so the second pass is cheap.

The result is the rescission rate you should expect to see in your reviews: between roughly 30 and 50 percent of internal candidates are dropped before reaching the PR. If you wired a tool to log Pass 1 candidates and compare against the final review, you would see that gap. From the PR author's perspective, it shows up as a clean, scannable set of findings — never as a "thinking out loud" stream.

Timeline

push
  -> "review queued" comment posts
  -> Pass 1 runs (high-recall sweep, internal candidates only)
  -> "review in progress" comment (still no findings shown)
  -> Pass 2 runs (rescind, refine, write user-facing text)
  -> tracking comment finalizes to PASS, ADVISORY, or FAIL
  -> inline comments post for each confirmed finding
  -> ai-pr-review status check updates

The whole sequence usually runs in 2 to 5 minutes. See How long does a review take? for the full timing breakdown.