Why two passes (and why some findings disappear)
ADO Pilot reviews in two passes — a high-recall first sweep, then a critical re-check that rescinds 30 to 50 percent of Pass 1 findings by design.
Last updated
ADO Pilot reviews each pull request in two passes. The first pass casts a wide net to catch every issue it can. The second pass re-reads each finding in context and discards the ones that do not hold up. Roughly 30 to 50 percent of Pass 1 findings are rescinded before they ever reach your PR — and that rescission rate is the design, not a defect.
Pass 1 — high-recall sweep
Pass 1 is tuned for recall. Its job is to surface anything that might be a problem so Pass 2 has a complete candidate set to work with.
- The reviewer reads the entire diff plus enrichment context (Semgrep findings, tree-sitter syntax summaries, linked file context).
- It produces an internal list of candidate findings, each with a location, a guessed severity, a category, and an evidence chain that explains the reasoning.
- Pass 1 deliberately tolerates false positives. A candidate finding only needs to clear the bar of "this could plausibly be wrong" — not "this is definitely wrong."
- None of these candidates are posted to your PR yet. They live only inside the review pipeline.
While Pass 1 is running, the tracking comment on your PR shows the in-progress placeholder — analyzing N files, expected completion in roughly M minutes — with no finding bullets. That blank state is intentional. Posting Pass 1 findings directly would mean roughly one in three would later be retracted, which is exactly the noise the two-pass design is built to avoid.
Pass 2 — critical re-check
Pass 2 takes the Pass 1 candidate set and re-evaluates each one with the full picture in view.
- It rescinds false positives and findings that are technically true but too trivial to mention. ("This isn't actually a bug because the caller already validates the input.")
- It confirms the findings that hold up and writes the user-facing comment text — the one-liner, the evidence prose, and the optional code suggestion.
- It adjusts severity when Pass 1 over-classified — for example, downgrading a Pass 1 "blocker" to a "suggestion" once Pass 2 sees that the surrounding code already handles the case.
Only the confirmed, refined findings post to your PR. Everything else is dropped silently.
Why the rescission rate is a feature
A reviewer that is conservative on the first pass — "only flag what you are sure about" — misses real bugs and feels timid. A reviewer that is aggressive on the first pass and posts everything floods you with false alarms and trains you to ignore it. Two passes get both:
- Pass 1 is aggressive, so genuine defects are never silently dropped.
- Pass 2 is critical, so the noise floor on your PR stays low.
- Both passes share cached system instructions and diff context, so the second pass is cheap.
The result is the rescission rate you should expect to see in your reviews: between roughly 30 and 50 percent of internal candidates are dropped before reaching the PR. If you wired a tool to log Pass 1 candidates and compare against the final review, you would see that gap. From the PR author's perspective, it shows up as a clean, scannable set of findings — never as a "thinking out loud" stream.
Timeline
push
-> "review queued" comment posts
-> Pass 1 runs (high-recall sweep, internal candidates only)
-> "review in progress" comment (still no findings shown)
-> Pass 2 runs (rescind, refine, write user-facing text)
-> tracking comment finalizes to PASS, ADVISORY, or FAIL
-> inline comments post for each confirmed finding
-> ai-pr-review status check updates
The whole sequence usually runs in 2 to 5 minutes. See How long does a review take? for the full timing breakdown.