Voice dial-in
What it is
Section titled “What it is”A phone number the exec can give to anyone (“Call +1-415-…”). Attendees dial in, have a normal voice conversation, hang up. Within ~30 seconds a brief summary lands in Slack.
Under the hood: Twilio handles telephony → Media Stream delivers the audio → Gemini Live does real-time speech-to-structured-text → chief-of-staff takes over.
Why voice
Section titled “Why voice”- Lowest friction. No “share your screen” or “add this person to Google Meet.” The exec gives out a number.
- No recording to deal with. Nothing to upload after.
- Attendee-agnostic. Works for customers, vendors, investors — anyone with a phone.
Setup (once, by the installer)
Section titled “Setup (once, by the installer)”You’ll need:
- A Twilio account with a voice-capable phone number.
- Your Workforce0 instance reachable from Twilio’s webhooks (i.e. publicly addressable, or tunnelled via Cloudflare Tunnel / ngrok for development).
- A Gemini API key with the Live API enabled (free tier is fine).
Twilio configuration
Section titled “Twilio configuration”- In your Twilio console, open the phone number’s settings.
- Under Voice Configuration → A call comes in, set webhook:
Method:https://your-workforce0.example.com/api/voice/incoming
POST. Primary handler: Webhook. - Save.
Workforce0 configuration
Section titled “Workforce0 configuration”Set in .env:
TWILIO_ACCOUNT_SID=AC...TWILIO_AUTH_TOKEN=...TWILIO_PHONE_NUMBER=+14155551234GEMINI_API_KEY=... # the Live API uses this keyRestart backend. Verify by calling the number — the handler should answer with a short greeting and the call should appear on the Meetings page within seconds.
Customising the greeting
Section titled “Customising the greeting”The greeting (“Hi, you’ve reached Workforce0 for Acme. Your meeting is
being captured. Speak normally when ready.”) is in
backend/src/voice/greeting.ts. Edit and rebuild.
Using it (exec)
Section titled “Using it (exec)”Give your attendee the number. Tell them:
- Call normally.
- Speak as they would in any meeting.
- Hang up when you’re done.
That’s it. They don’t need an account, an app, or an invitation.
During the call
Section titled “During the call”- There’s a live indicator on the web UI’s dashboard while a call is active — partial transcript visible.
- The exec can join the call themselves from the web UI (Twilio dial-out) if they’re the one running the call.
- Calls that run longer than 60 minutes gracefully close and start a new meeting — the chief-of-staff stitches them back together.
After the call
Section titled “After the call”Within ~30 seconds after hang-up:
- The meeting appears on the Meetings page with status
ready. - If auto-brief is on, a brief is drafted and posted to Slack.
- You approve, redirect, or pause — same flow as upload.
Voice runs through two paid services:
- Twilio. ~$0.014/min in the US; varies by country.
- Gemini Live. Counts against your Gemini quota. A 30-minute call is well within the free tier.
See Cost caps to put a ceiling on total monthly spend.
Security
Section titled “Security”- The phone number is public — anyone who has it can dial in. If that’s unacceptable, set an allowlist of caller numbers in the dashboard (Settings → Voice → Allowlist). Calls from other numbers get a polite “we don’t recognize this number” prompt and hang up.
- Recordings are not stored by default. Twilio’s media stream
passes through your infra; nothing is persisted to disk. Flip
RETAIN_CALL_AUDIO=1if you want it for QA. - The Gemini Live transcript is stored in your Postgres like any other transcript.
Limitations
Section titled “Limitations”- English-only today. Gemini Live supports more languages; we haven’t tested the full matrix.
- No participant attribution. If two people are on the call, the transcript doesn’t label who said what. If attribution matters, use a recording app that captures speaker diarization and upload.
- Background noise hurts quality. A speakerphone in a café is rough; a headset in a quiet room is flawless.
Troubleshooting
Section titled “Troubleshooting”| Symptom | Fix |
|---|---|
| Call rings, no greeting | Twilio webhook URL wrong or unreachable. |
| Greeting plays, then silence | Gemini key invalid or rate-limited; check backend logs. |
| Call connects but no transcript | Media Stream URL isn’t reachable from Twilio — check firewall. |
| Multiple briefs generated per call | Auto-brief is on; and the 60-min stitching didn’t match — report it. |
For everything else, backend logs grep voice: for the right
context.