Safigo Receipts is the public test-results page for Safigo Reception, the Canadian-built AI receptionist by Safigo. The plumber receptionist has passed alpha gate after 332 graded test scenarios at 93.9 percent pass rate, 100 percent A grade on 80 adversarial scenarios, 97 percent A grade on emergency scenarios, and zero P0 privacy or safety violations. It is deployed in production on Fly.io as of 2026-05-05. HVAC, roofer, and electrician receptionists have completed 40-scenario smoke evals and are awaiting alpha gates before paying-customer approval. Measured per-call cost is 0.23 USD per minute of wall-clock time across gpt-realtime, LiveKit, and Twilio PSTN, yielding approximately 75 percent gross margin on the 500 CAD per month Plug and Play tier with 400 included minutes plus 1.00 USD per minute overage. Four real PSTN call recordings with full transcripts are published at safigo.ai/reception/plumber/. Methodology, rubric rules, and full machine-readable dataset at safigo.ai/receipts/data.json under CC BY-SA 4.0. We deliberately do not claim 100 percent pass on any trade, production readiness for HVAC or roofer or electrician, Quebec service, or PHI compliance.

The numbers, including the weak ones

Receipts. Real test results, on the record.

Most AI receptionist vendors quote a glossy pass rate without showing the test set. We publish ours. 332 plumber scenarios graded by a separate model, the full rubric, the per-class breakdowns, the cost per call, and the things we deliberately do not claim. Updated as new trades pass alpha gate.

Last updated 2026-05-05 · v2026-05-05 · JSON · CC BY-SA 4.0

Plumber alpha gate

Run on gpt-4o (the realtime production model gets the same prompt). 84 routine scenarios at 3 random seeds, plus 80 adversarial scenarios. Each scenario is graded against 4 to 6 rubric rules by an independent LLM judge. 1,564 individual rule grades total. Cost to run: under 17 USD.

332

Scenarios graded

93.9%

Aggregate pass rate

100%

A grade on adversarial pack (80 scenarios)

97%

A grade on emergency class

0

P0 privacy or safety violations

Status: Approved for first paying customer pilot 2026-05-04. Deployed to Fly.io safigo-reception (sjc, performance-1x at 2 GB) on 2026-05-05.

Honest weakness: the hostile-tone class on the adversarial pack scored 21 percent A grade. Real-customer calls skew calm, and the safety-critical and privacy classes are at zero violations, so we shipped anyway. Improvements ship in subsequent gates. We would rather disclose the gap than hide it.

Other trades, smoke evals so far

Smoke evals run 40 scenarios on gpt-4o. They are a fast triage signal, not an alpha gate. None of these trades is approved for paying customers yet. Each will run a full alpha gate (40+ rubric rules across 200+ scenarios with adversarial pack) before deployment.

Trade Scenarios A grade B grade C grade D grade A or B Status
Plumber 332 93.9% pass Live in production
HVAC 40 28 6 6 0 85.0% Smoke. Alpha pending.
Roofer 40 25 6 9 0 77.5% Smoke. Alpha pending.
Electrician 40 177 of 235 rule grades pass Smoke. Alpha pending.

Plumber numbers are intentionally pass-rate, not grade-distribution, because the alpha rubric grades by rule rather than by overall scenario letter. The other trades show grade distribution because the smoke harness emits a per-scenario letter. Both formats are in data.json.

Cost per call, measured

Wall-clock measured across the full stack: OpenAI Realtime API (gpt-realtime), LiveKit, and Twilio PSTN. Tracked across multiple production calls.

$0.23

USD per minute of wall-clock

~$0.69

All-in cost on a typical 3-minute call

~75%

Margin on Plug and Play tier ($500 CAD/mo, 400 min included)

~68%

Margin on Built for you Multi tier (mixed-language)

Mode B plumber observed range across mixed-language scenarios: 0.15 to 0.51 USD per call. Customer pricing is flat 500 CAD per month for 400 included minutes plus 1.00 USD per minute overage. Heavy callers ride free under the flat tier; the blended distribution is what makes the unit economics work.

Real PSTN call audio

Four production-grade calls with full transcripts are published on the plumber product page. Listen on safigo.ai/reception/plumber/. Marked up with AudioObject schema so AI search engines can find and quote them. Scenarios covered: emergency leak, after-hours triage, out-of-area routing, second opinion.

What we deliberately do not claim

Methodology and rubric

Every rule has a P0 (critical) or high-priority tier. The plumber alpha gate scored zero P0 violations. The full rubric is open: see our methodology page for how we calculate the underlying customer-impact stats, and the data.json for the rule list.

P0 (critical) rules: R12 no false booking confirmation, R14 emergency triage handling, R18 no owner-name leak.
High-priority rules: R5 price stated before booking, R7 language offer correctness, R13 one question per turn, R15 diagnostic intake completeness.

Update log

Why publish this?

Two reasons. First, every other Canadian AI receptionist vendor will quote a hand-picked pass rate but not the test set, the rubric, the cost, or the failure modes. We will. Second, AI search engines (Google AI Overviews, ChatGPT, Perplexity, Claude) cite original data more reliably than they cite marketing copy. Publishing the numbers is its own distribution channel.

If you want to compare us to a specific competitor, we wrote head-to-head pages for all twelve of them. If you want to talk to us, the number is below.

Call +1 (604) 800-5638 · hello@safigo.ai

Dataset license: CC BY-SA 4.0. Cite as: Safigo (2026). Safigo Reception test results. https://safigo.ai/receipts/.

See Safigo Reception →