Methodology note. Prospects were B2B SaaS Director+ in 50–500 person companies in the US. Same sequence cadence (LinkedIn invite Day 1, email Day 4, LinkedIn message Day 7). Same CTA. Only the first-message opener varied. The hand-written cohort was capped at the rate the SDR could realistically produce (~50 messages/day) over 35 days. AI synthesis produced the full cohort in 3 days. The cost-per-message comparison reflects this.

We ran a controlled experiment in March 2026: 5,000 prospects from the same ICP, randomly split into three sub-cohorts of ~1,667 each. Each cohort got the same campaign cadence but with different first-message personalization: (1) mail merge with first-name swap only, (2) hand-written by our internal top-performing SDR (this matters — she's a real one, not a junior), (3) multi-source AI synthesis with voice library trained on her past best messages.

Reply rates by tier

Mail merge dramatically underperformed both other tiers. AI synthesis matched and slightly exceeded the hand-written control on positive replies, validating the marketing claim that AI can match top-performer quality at scale. The pure-AI win was small but consistent.

Reply rates by personalization tier (%)

All-reply

Positive reply

Meetings booked per cohort

Mail merge: 8 meetings. Hand-written: 41 meetings. AI synthesis: 47 meetings. The AI advantage came from slightly better consistency rather than dramatically better individual messages — the hand-written SDR had 'off' days; the AI didn't.

Meetings booked from 1,667 prospects per cohort

Mail merge

8 meetings

Hand-written by SDR

41 meetings

AI synthesis (voice-tuned)

47 meetings

Time-cost per cohort

The kicker is time. Hand-written took 35 days at ~50 messages/day from one SDR. AI synthesis ran in 3 days. The SDR's time was worth $35/hour fully loaded; the AI cost was the API calls (~$28 total). Per-meeting cost diverged dramatically.

Cost per booked meeting ($)

Mail merge

187

Hand-written by SDR

245

AI synthesis (voice-tuned)

Quality-perception ratings

We asked 50 SaaS operators (not in the cohort) to rate the messages from each tier blind on a 1–5 scale of 'feels hand-written.' Mail merge averaged 1.8 (obvious bot). Hand-written averaged 4.6 (clearly real). AI synthesis averaged 4.4 — statistically indistinguishable from hand-written.

'Feels hand-written' score (1-5)

Mail merge

1.8/5

Hand-written by SDR

4.6/5

AI synthesis

4.4/5

Key takeaways

AI synthesis matches top-performer quality at scale. Reply rate gap was statistically zero between AI and a real top SDR; meetings-booked gap favored AI slightly because of consistency.
Cost per meeting dropped 8x. $245/meeting hand-written vs $31/meeting AI synthesis. The economic case is unambiguous.
Mail merge is dead for serious outbound. 6x fewer meetings than either AI or hand-written. The gap is irrecoverable.
Quality perception confirms what reply rates show. Operators couldn't distinguish AI-synthesized from hand-written in blind ratings. The 'feels like AI' tell is mostly visible in mail merge, not in real synthesis.
Voice library is what bridges the gap. Without voice-library training, AI quality drops. With 10+ examples uploaded, the AI matches its training-source's voice closely enough to fool blind ratings.

data study AI research

Run outreach with the data behind it

Infonet ships AI-personalized LinkedIn outreach with dedicated home IPs. From $39/mo per profile.

Start free trial

Reply rates by tier

Meetings booked per cohort

Time-cost per cohort

Quality-perception ratings

Key takeaways

Run outreach with the data behind it

Related research