We ran a controlled experiment in March 2026: 5,000 prospects from the same ICP, randomly split into three sub-cohorts of ~1,667 each. Each cohort got the same campaign cadence but with different first-message personalization: (1) mail merge with first-name swap only, (2) hand-written by our internal top-performing SDR (this matters — she's a real one, not a junior), (3) multi-source AI synthesis with voice library trained on her past best messages.
Reply rates by tier
Mail merge dramatically underperformed both other tiers. AI synthesis matched and slightly exceeded the hand-written control on positive replies, validating the marketing claim that AI can match top-performer quality at scale. The pure-AI win was small but consistent.
Meetings booked per cohort
Mail merge: 8 meetings. Hand-written: 41 meetings. AI synthesis: 47 meetings. The AI advantage came from slightly better consistency rather than dramatically better individual messages — the hand-written SDR had 'off' days; the AI didn't.
Time-cost per cohort
The kicker is time. Hand-written took 35 days at ~50 messages/day from one SDR. AI synthesis ran in 3 days. The SDR's time was worth $35/hour fully loaded; the AI cost was the API calls (~$28 total). Per-meeting cost diverged dramatically.
Quality-perception ratings
We asked 50 SaaS operators (not in the cohort) to rate the messages from each tier blind on a 1–5 scale of 'feels hand-written.' Mail merge averaged 1.8 (obvious bot). Hand-written averaged 4.6 (clearly real). AI synthesis averaged 4.4 — statistically indistinguishable from hand-written.
Key takeaways
- AI synthesis matches top-performer quality at scale. Reply rate gap was statistically zero between AI and a real top SDR; meetings-booked gap favored AI slightly because of consistency.
- Cost per meeting dropped 8x. $245/meeting hand-written vs $31/meeting AI synthesis. The economic case is unambiguous.
- Mail merge is dead for serious outbound. 6x fewer meetings than either AI or hand-written. The gap is irrecoverable.
- Quality perception confirms what reply rates show. Operators couldn't distinguish AI-synthesized from hand-written in blind ratings. The 'feels like AI' tell is mostly visible in mail merge, not in real synthesis.
- Voice library is what bridges the gap. Without voice-library training, AI quality drops. With 10+ examples uploaded, the AI matches its training-source's voice closely enough to fool blind ratings.
Run outreach with the data behind it
Infonet ships AI-personalized LinkedIn outreach with dedicated home IPs. From $39/mo per profile.
Start free trial