Salad Inference Endpoints (SIE)

Instantly deploy ML production inference to thousands of dedicated GPUs—for a fraction of the cost of public cloud.

How It Works

Salad is a fully managed orchestration platform for AI/ML deployment. With just one click, Salad Inference Endpoints API allows you to scale inferences to infinity without configuring infrastructure.

Ready-To-Use Models

Usage-Based Pricing

Never pay for more than you use. Our industry-leading inference , and they get even better with volume pricing.

$0.25/hour

3,000+ Stable Diffusion images per dollar
900,000+ BERT inferences per dollar
Save with volume pricing

Frequently Asked Questions

What is the difference between SCE & SIE?
expand_more
Salad Container Engine (SCE) is a fully managed orchestration service capable of running container images from private or public registries far more affordably than competing products (such as AWS Fargate or GCP Cloud Run). Salad Inference Endpoints (SIE) allows developers to deploy deep learning models for production-scale inference via simple API calls. Teams accustomed to hyperscale solutions will find the workflow similar to that of AWS Sagemaker Inference APIs.
How are my costs calculated?
expand_more
Salad Inference Endpoints is a fully managed inference service that automatically selects the most cost-efficient hardware configuration for your model. You will only be billed for real-time usage while a Salad machine computes your request. Based on GPU and vCPU availability, Salad's workload engine may select a secondary configuration, which could result in marginally greater cost-per-inference or reduced performance.
How does SIE handle cold starts?
expand_more
In order to minimize loading times for ready-to-use models, Salad maintains standby nodes with a variety of popular deep learning models pre-installed. When deploying a custom model, you may experience cold starts while our workload orchestration engine selects ideal hardware and distributes the necessary dependencies over encrypted communication. We never charge for loading times; billing periods last from the exact moment that a GPU or CPU begins computing your request until results are returned.
Can I use SIE with my own model?
expand_more
Yes! To begin submitting custom models to SIE services, please contact our onboarding team to request private access to your unique API endpoint.
Why do some requests take longer than others?
expand_more
SaladCloud infrastructure consists of heterogeneous hardware located around the world. With so many different GPUs and CPUs on our network, our orchestration engine has been designed to assess their unique configurations and optimize model performance by distributing tasks to the best available machines. You may manually select a GPU type via the Salad Container Engine (SCE), or message our onboarding team to request access to a custom hardware cohort.
When can I access SIE?
expand_more
Our network offers instant access to thousands of dedicated GPUs. Now that Salad Container Engine (SCE) has entered public beta, we will begin onboarding new SIE users on a rolling basis starting in January 2023. Join the waitlist today to receive early access!