DevOps and SRE Jobs at AI Companies
DevOps and SRE engineers at AI companies manage some of the most demanding infrastructure in tech — GPU clusters for model training, low-latency inference serving, and highly available API platforms serving millions of requests. These roles combine traditional infrastructure expertise with the unique challenges of AI systems.
Last updated: May 13, 2026
Latest DevOps & SRE Jobs at AI Companies
View all jobsSenior Site Reliability Engineer I
Software Engineer II, Cloud Engineering
Senior Site Reliability Engineer I
Senior Site Reliability Engineer I
Senior Manager, DevOps
Lead Salesforce Platform Engineer - Enterprise Integration
Senior Engineering Manager – DevOps, Infrastructure & Release Engineering
Senior AI Platform Engineer
Sr Cloud Platform Engineer
Senior Platform Engineer, Cloud Networking & Mesh
Senior Infrastructure Engineer
Senior Cloud Engineer I
DevOps Engineer (f/m/x)
Software Engineering Manager - SRE
DevOps Engineer - Zoom Phone
Senior DevOps Engineer
Cloud Engineer- Senior
Senior Cloud Engineer (GCP)
Platform Engineer
Zoom AI DevOps Engineer
Frequently Asked Questions
What does a DevOps/SRE engineer do at an AI company?
DevOps and SRE engineers at AI companies manage GPU clusters for model training, build and maintain inference serving infrastructure, design CI/CD pipelines for ML workflows, and ensure high availability for AI APIs that serve millions of users. Unlike traditional DevOps roles, you'll work with specialized hardware (NVIDIA GPUs, TPUs), manage large-scale distributed training jobs, and optimize infrastructure costs that can run into millions per month. Core tools include Kubernetes, Terraform, Docker, and cloud platforms (AWS, GCP, Azure) with deep expertise in GPU orchestration.
What is the salary for DevOps engineers at AI companies?
DevOps and SRE engineers at AI companies earn $140K-$210K at mid-level and $190K-$320K+ for senior and staff roles. The premium over traditional DevOps reflects the specialized skills needed for GPU infrastructure, large-scale distributed systems, and the critical nature of AI inference availability. Companies competing for cloud infrastructure talent in the AI space — particularly those managing large GPU clusters — often pay at the top of the market to attract and retain engineers who can keep their systems running reliably.
What skills do I need for DevOps at an AI company?
Core skills include Kubernetes (especially GPU scheduling), Terraform/Pulumi for infrastructure-as-code, CI/CD pipelines, and deep experience with at least one major cloud provider (AWS, GCP, or Azure). Experience with GPU workloads, NVIDIA CUDA, and container orchestration for ML training is highly valued. Monitoring and observability skills (Prometheus, Grafana, Datadog) are essential since AI systems have unique failure modes. You don't need to understand ML algorithms, but knowing how model training and inference work at an infrastructure level helps you make better architectural decisions.
AI Job Insights for DevOps & SRE Jobs at AI Companies
Salary Range (Yearly, USD)
$130 - $999K
Median $165K from 29 listings with salary data
Top Companies Hiring
Based on recent listings shown on this page.
Common Roles
Counts reflect recent listings, not total market size.
In-Demand Skills
Derived from tags on recent listings.
Explore More AI Job Paths
Explore More AI Job Categories
Backend Engineer Jobs at AI Companies
Find backend engineering roles at top AI companies and startups. Build the systems that power AI products at scale.
MLOps Jobs
Find MLOps and ML Infrastructure roles. Build the platforms that power AI systems.
Security Engineer Jobs at AI Companies
Find security engineering roles at top AI companies. Protect AI systems, APIs, and user data.