Disclosure: Some links below are affiliate links. If you buy through them, this site earns a small commission at no extra cost. Editorial recommendations are never influenced by affiliate rates.

Google launched Gemini 3.1 Flash TTS on 15 April 2026, and it's shaken the text-to-speech market. For NZ content creators, it's the first real alternative to ElevenLabs that combines affordability, voice control, and a fat watermark to prove the audio is AI-made.

If you're building voiceovers for YouTube, podcasts, or commercial video, this matters. The tool you pick changes both your audio quality and how much you pay per minute of speech.

The Simplest Comparison: Cost Per 1,000 Words

Tool Price per 1M characters Annual cost (100K words/month) Voice quality Voice cloning Watermarking
Gemini 3.1 Flash TTS $1.00 (input) + $20 (output) per 1M tokens ~NZD $480/year High (improved) No Yes (SynthID)
ElevenLabs ~$300 per 1M characters (Scale tier) ~NZD $6,480/year Highest Yes No
OpenAI TTS $15 per 1M characters ~NZD $3,240/year High No No
Amazon Polly $4 per 1M characters (Standard) ~NZD $864/year Medium No No

Reality check: Gemini 3.1 Flash TTS is 5–10 times cheaper than ElevenLabs. OpenAI TTS sits in the middle.

What's Actually New in Gemini 3.1 Flash TTS

Google's announcement (15 April 2026) highlighted three upgrades:

  1. Audio tags for voice control. You can now direct the AI to speak with a "warm and conversational" tone, or "broadcast-style, authoritative" delivery, without re-recording. This works across 70+ languages.

  2. Improved naturalness. The model sounds less robotic than previous Google TTS versions. Pitch, pace, and breath patterns are more human.

  3. SynthID watermarking. Every audio file is tagged with an invisible watermark proving it's AI-generated. This helps prevent deepfake abuse and distinguishes your content as synthetically voiced. Some creators see this as a positive; others avoid watermarked tools.

All three reduce production time for creators building video voiceovers, podcast intros, and audiobook narration.

Practical Comparison: What Each Tool Does Best

Gemini 3.1 Flash TTS

Who should use it: Budget-conscious creators, high-volume voiceover projects, anyone who wants voice control without editing.

Strengths:

Weaknesses:

Best for: YouTube creators with tight budgets, businesses recording customer-facing messages, anyone publishing 50+ voiceover minutes per month.

ElevenLabs

Who should use it: Brands needing professional voice cloning, creators who want premium audio quality, anyone willing to pay for it.

Strengths:

Weaknesses:

Best for: Premium video production, character voices in games, audiobook narration where human-quality audio is non-negotiable.

OpenAI TTS

Who should use it: People already using ChatGPT API, teams wanting simplicity with reasonable quality.

Strengths:

Weaknesses:

Best for: Developers building ChatGPT integrations, teams wanting quick voiceovers without learning a new platform.

The Numbers: How Much You'll Actually Spend

Let's model three real scenarios:

Scenario 1: YouTube Creator (10 videos/month, 2,000 words per video)

Winner: Gemini (50x cheaper than ElevenLabs)

Scenario 2: Podcast Network (4 episodes/week, 3,000 words per episode)

Winner: Gemini (54x cheaper than ElevenLabs)

Scenario 3: Software Company (customer-facing message templates, 200K+ words/month)

Winner: Depends on quality tolerance. Gemini if you can live with rate limits; ElevenLabs if your brand voice is critical.

The Watermark Question

SynthID is both Gemini's biggest strength and biggest liability.

SynthID means:

No watermark means:

For NZ creators, watermarked audio is becoming expected. The YouTube and TikTok communities are normalising AI voices. If your brand is built on authenticity, watermarked audio might feel wrong. But if you're shipping fast and cost matters, the watermark is honest.

Integration: Where Each Tool Lives

Tool Free test API Web UI Plugin/extension
Gemini 3.1 Flash TTS Yes (Google AI Studio) Yes (Gemini API) Yes (Google Vids) No
ElevenLabs Limited (10K credits) Yes Yes (dashboard) Chrome extension
OpenAI TTS With ChatGPT Pro Yes (API) Web + ChatGPT No

For NZ creators: Google Vids (free for creators) integrates Gemini TTS directly. If you're already on YouTube, you can test it without signing up for anything else.

Real recommendation

Use Gemini 3.1 Flash TTS if:

Use ElevenLabs if:

Use OpenAI TTS if:

For most NZ creators and small businesses, Gemini 3.1 Flash TTS is the no-brainer starting point. It's free to test, it's dramatically cheaper than competitors, and the audio quality is good enough for YouTube, podcasts, and automated messaging. Upgrade to ElevenLabs only when your brand voice becomes non-negotiable.

NZ availability

Gemini 3.1 Flash TTS is available now via Google AI Studio (free tier), Gemini API, and Google Vids. No VPN or special regional access needed. Google's TTS APIs work from New Zealand without additional setup.

Pricing is in USD tokens (converted to NZD at the current API rate). At current exchange rates (USD 1 ≈ NZD 1.65), expect to pay roughly NZD $0.50 per 1,000 words using Gemini, compared to NZD $5+ with ElevenLabs.


Recommended tools for NZ creators:

Affiliate disclosures: This article references Google (free), OpenAI (API affiliate eligible), ElevenLabs (affiliate: https://try.elevenlabs.io/7w6xpqq7vgq9), Riverside.fm (affiliate), and Runway (affiliate). No commission on Gemini TTS itself. It's a free tool.

TD
Toby Downs is an independent tech writer based in New Zealand, covering SaaS, AI tools, and business software for tpdowns.com. No paid placements, no sponsored opinions — just research.