Speech-to-text & Text-to-speech

Save up to 90% for
Voice AI inference

Whether you are serving inference for Speech-to-Text, Text-to-Speech, Chatbots, or batch translating 1000s of hrs of audio, Salad’s consumer GPUs can reduce your cloud cost by up to 90% compared to managed services and APIs.  

Audio AI - automatic speech recognition with Whisper large

Have questions about enterprise pricing for SaladCloud?

Book a 15 min call with our team.

Run popular models or bring your own models

Speech-to-Text

For use cases like automatic speech recognition (ASR), translation, captioning, subtitling, etc., SaladCloud is 90% or more affordable than APIs and hyperscalers.

$0.005
cost per hour
Save 99% On Audio Transcription with self-managed Whisper on SaladCloud.
47,638
minutes per dollar
Get a 1000-fold cost reduction with Parakeet TDT 1.1B compared to popular APIs.
Speech to text - Automatic speech recognition with Whisper large on Salad GPU cloud
Text to speech -

Text-to-Speech

Save up to 90% on Text-to-Speech (TTS) inference with SaladCloud’s consumer GPUs. The RTX & GTX series GPUs deliver the best cost performance for TTS inference.

6,000,000
words per dollar
The RTX 2070 & GTX 1650 deliver almost 6,000,000 words per dollar for TTS use cases with OpenVoice.
230
words per second
The RTX 3080 Ti delivers 230.4 words per second, offering the best speed-to-cost ratio at just $0.20/hour with OpenVoice.
39,000
words per dollar
The RTX 3060 and GTX 1060 delivers 39,000 words per dollar with Bark Text-to-Speech model.

The Lowest Cost For Voice AI Inference

Voice AI models are perfect for consumer GPUs. They offer incredible cost performance and save thousands of dollars compared to running on public clouds.
Scale quickly to thousands of GPU instances worldwide without the need to manage VMs or individual instances, all with a simple usage-based price structure.

Transcription

Save up to 98% on transcription costs compared to the public cloud with about 60X real-time speed on RTX 3090s.

Translation

Get better machine translation economics on SaladCloud's network of GPUs at the lowest market prices.

Captioning / Subtitles

Cut AI captioning/subtitle generation costs by at least 50% on SaladCloud.