Disclosure: Some links below are affiliate links. If you buy through them, this site earns a small commission at no extra cost. Editorial recommendations are never influenced by affiliate rates.
In this article
- The Simplest Comparison: Cost Per 1,000 Words
- What's Actually New in Gemini 3.1 Flash TTS
- Practical Comparison: What Each Tool Does Best
- Gemini 3.1 Flash TTS
- ElevenLabs
- OpenAI TTS
- The Numbers: How Much You'll Actually Spend
- Scenario 1: YouTube Creator (10 videos/month, 2,000 words per video)
- Scenario 2: Podcast Network (4 episodes/week, 3,000 words per episode)
- Scenario 3: Software Company (customer-facing message templates, 200K+ words/month)
- The Watermark Question
- Integration: Where Each Tool Lives
- Real recommendation
- NZ availability
Google launched Gemini 3.1 Flash TTS on 15 April 2026, and it's shaken the text-to-speech market. For NZ content creators, it's the first real alternative to ElevenLabs that combines affordability, voice control, and a fat watermark to prove the audio is AI-made.
If you're building voiceovers for YouTube, podcasts, or commercial video, this matters. The tool you pick changes both your audio quality and how much you pay per minute of speech.
The Simplest Comparison: Cost Per 1,000 Words
| Tool | Price per 1M characters | Annual cost (100K words/month) | Voice quality | Voice cloning | Watermarking |
|---|---|---|---|---|---|
| Gemini 3.1 Flash TTS | $1.00 (input) + $20 (output) per 1M tokens | ~NZD $480/year | High (improved) | No | Yes (SynthID) |
| ElevenLabs | ~$300 per 1M characters (Scale tier) | ~NZD $6,480/year | Highest | Yes | No |
| OpenAI TTS | $15 per 1M characters | ~NZD $3,240/year | High | No | No |
| Amazon Polly | $4 per 1M characters (Standard) | ~NZD $864/year | Medium | No | No |
Reality check: Gemini 3.1 Flash TTS is 5–10 times cheaper than ElevenLabs. OpenAI TTS sits in the middle.
What's Actually New in Gemini 3.1 Flash TTS
Google's announcement (15 April 2026) highlighted three upgrades:
Audio tags for voice control. You can now direct the AI to speak with a "warm and conversational" tone, or "broadcast-style, authoritative" delivery, without re-recording. This works across 70+ languages.
Improved naturalness. The model sounds less robotic than previous Google TTS versions. Pitch, pace, and breath patterns are more human.
SynthID watermarking. Every audio file is tagged with an invisible watermark proving it's AI-generated. This helps prevent deepfake abuse and distinguishes your content as synthetically voiced. Some creators see this as a positive; others avoid watermarked tools.
All three reduce production time for creators building video voiceovers, podcast intros, and audiobook narration.
Practical Comparison: What Each Tool Does Best
Gemini 3.1 Flash TTS
Who should use it: Budget-conscious creators, high-volume voiceover projects, anyone who wants voice control without editing.
Strengths:
- Cheapest option by far (NZD ~$0.50 per 1,000 words if your monthly volume is modest)
- 70+ languages with consistent quality
- Audio tags mean you can adjust tone without re-generating the entire clip
- Works inside Google AI Studio (free to test), Vertex AI, and Google Vids (free for creators)
- No credit card required to test in AI Studio
Weaknesses:
- No voice cloning (you can't input a sample of your own voice and have it replicated)
- SynthID watermark is present in all output (good for legal clarity, bad if you want the audio to pass as human)
- Audio tags are still new; voice control is basic compared to ElevenLabs' fine-tuning
Best for: YouTube creators with tight budgets, businesses recording customer-facing messages, anyone publishing 50+ voiceover minutes per month.
ElevenLabs
Who should use it: Brands needing professional voice cloning, creators who want premium audio quality, anyone willing to pay for it.
Strengths:
- Gold standard for voice cloning (submit a 1-minute sample, get a custom voice trained in seconds)
- 1000+ pre-trained voices in dozens of languages
- No watermark (your audio sounds fully human)
- Conversational AI 2.0 for building voice agents
- High-end sound design (you can control emotion, emphasis, and pacing at a granular level)
Weaknesses:
- Most expensive option (NZD ~$6,480/year at Scale tier for 100K words/month)
- Free tier is limited (10,000 credits/month = ~5,000 words)
- Overkill for simple, straightforward narration
Best for: Premium video production, character voices in games, audiobook narration where human-quality audio is non-negotiable.
OpenAI TTS
Who should use it: People already using ChatGPT API, teams wanting simplicity with reasonable quality.
Strengths:
- $15 per 1M characters (middle ground between Gemini and ElevenLabs)
- Two voices (Alloy and Echo) that sound natural
- Streaming support (useful for real-time applications)
- Integrated into ChatGPT Pro (no separate signup needed if you're already on the API)
Weaknesses:
- Limited voice options (only 6 built-in voices, no cloning)
- Audio quality is noticeably behind ElevenLabs
- No granular tone control (you can't ask for "warm and playful" like you can with Gemini)
- No watermark = legally riskier if used for deepfakes
Best for: Developers building ChatGPT integrations, teams wanting quick voiceovers without learning a new platform.
The Numbers: How Much You'll Actually Spend
Let's model three real scenarios:
Scenario 1: YouTube Creator (10 videos/month, 2,000 words per video)
- Monthly words: 20,000
- Gemini 3.1 Flash TTS: NZD ~$5/month = NZD $60/year
- OpenAI TTS: NZD ~$12/month = NZD $144/year
- ElevenLabs: NZD ~$270/month = NZD $3,240/year
Winner: Gemini (50x cheaper than ElevenLabs)
Scenario 2: Podcast Network (4 episodes/week, 3,000 words per episode)
- Monthly words: 52,000
- Gemini 3.1 Flash TTS: NZD ~$13/month = NZD $156/year
- OpenAI TTS: NZD ~$31/month = NZD $372/year
- ElevenLabs: NZD ~$700/month = NZD $8,400/year
Winner: Gemini (54x cheaper than ElevenLabs)
Scenario 3: Software Company (customer-facing message templates, 200K+ words/month)
- Monthly words: 200,000+
- Gemini 3.1 Flash TTS: NZD ~$48/month = NZD $576/year (but hits rate limits)
- OpenAI TTS: NZD ~$120/month = NZD $1,440/year (but limited voices)
- ElevenLabs: NZD ~$2,700/month = NZD $32,400/year (built for scale)
Winner: Depends on quality tolerance. Gemini if you can live with rate limits; ElevenLabs if your brand voice is critical.
The Watermark Question
SynthID is both Gemini's biggest strength and biggest liability.
SynthID means:
- Your audience knows the voice is AI (legal clarity, good for transparency)
- It's harder to use for deceptive deepfakes (Google's stated goal)
- Some creators see it as "cheap" branding (you're using an obvious AI tool)
No watermark means:
- Your audio sounds fully human (ElevenLabs, OpenAI)
- Your audience assumes it's a real voiceover artist
- You're legally exposed if you use it for misinformation
For NZ creators, watermarked audio is becoming expected. The YouTube and TikTok communities are normalising AI voices. If your brand is built on authenticity, watermarked audio might feel wrong. But if you're shipping fast and cost matters, the watermark is honest.
Integration: Where Each Tool Lives
| Tool | Free test | API | Web UI | Plugin/extension |
|---|---|---|---|---|
| Gemini 3.1 Flash TTS | Yes (Google AI Studio) | Yes (Gemini API) | Yes (Google Vids) | No |
| ElevenLabs | Limited (10K credits) | Yes | Yes (dashboard) | Chrome extension |
| OpenAI TTS | With ChatGPT Pro | Yes (API) | Web + ChatGPT | No |
For NZ creators: Google Vids (free for creators) integrates Gemini TTS directly. If you're already on YouTube, you can test it without signing up for anything else.
Real recommendation
Use Gemini 3.1 Flash TTS if:
- Your budget is under NZD $500/year
- You're comfortable with AI-watermarked audio
- You need 70+ language support
- You're working in Google's ecosystem (YouTube, Google Vids)
Use ElevenLabs if:
- Your brand depends on undetectable, premium voice quality
- You need voice cloning (custom voice samples)
- You're building commercial software with voice features
- Budget is not your constraint
Use OpenAI TTS if:
- You're already using the ChatGPT API
- You want a middle ground (price + quality)
- You need real-time streaming for live applications
For most NZ creators and small businesses, Gemini 3.1 Flash TTS is the no-brainer starting point. It's free to test, it's dramatically cheaper than competitors, and the audio quality is good enough for YouTube, podcasts, and automated messaging. Upgrade to ElevenLabs only when your brand voice becomes non-negotiable.
NZ availability
Gemini 3.1 Flash TTS is available now via Google AI Studio (free tier), Gemini API, and Google Vids. No VPN or special regional access needed. Google's TTS APIs work from New Zealand without additional setup.
Pricing is in USD tokens (converted to NZD at the current API rate). At current exchange rates (USD 1 ≈ NZD 1.65), expect to pay roughly NZD $0.50 per 1,000 words using Gemini, compared to NZD $5+ with ElevenLabs.
Recommended tools for NZ creators:
- Descript pairs AI voice editing with transcription (ideal if you want Gemini TTS output refined; US$12/month base, NZ friendly)
- Riverside.fm for podcast recording and hosting ($10/month, NZ-friendly)
- Runway AI for video + audio editing integration ($15/month)
Affiliate disclosures: This article references Google (free), OpenAI (API affiliate eligible), ElevenLabs (affiliate: https://try.elevenlabs.io/7w6xpqq7vgq9), Riverside.fm (affiliate), and Runway (affiliate). No commission on Gemini TTS itself. It's a free tool.