TixelJobs
C
Cerebras Systemsvia Greenhouse

Engineering Manager, Inference Cloud

Sunnyvale CA or Toronto CanadaPosted 2mo ago
OtherLeadFull-time#ai-lab

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Engineering Manager, Inference Cloud at Cerebras Systems. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.  

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. 

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Location: Sunnyvale 

About the Role

We're looking for a deeply technical, hands-on engineering leader to scale our Inference Cloud Platform. This team owns the cloud layer that powers our Inference Service, with direct responsibility for availability, reliability, latency, and global scalability. 

You'll lead a high-performing team building the systems that keep inference fast and reliable at massive scale: multi-region traffic management, intelligent routing, graceful degradation under load, and best-in-class observability. The work sits at the intersection of distributed systems, cloud-native infrastructure, and the unique demands of serving AI workloads in production, including bursty, unpredictable traffic patterns unique to model serving. 

If you're passionate about building resilient, globally distributed systems and solving hard infrastructure problems for AI, we'd love to talk. 

Responsibilities 

Technical Strategy & Architecture 

  • Platform Vision & Roadmap. Own the technical direction for the Inference Cloud Platform, prioritizing cloud-native scalability, reliability, and multi-region architecture. 
  • Core Infrastructure. Lead the design of foundational cloud-layer systems including service discovery, service mesh, request routing, load balancing, caching, and batching to optimize latency, throughput, and cost efficiency. 
  • Resilience & High Availability. Build and operate fault-tolerant, active-active multi-region systems with strong SLAs/SLOs, rapid failover, and graceful degradation strategies such as circuit breaking, backpressure, and load shedding. 
  • Traffic Control & Quality of Service. Design traffic prioritization, rate limiting, quota management, and admission control systems, including differentiated service tiers with independent SLOs, to ensure fairness and protect stability under extreme load.
    Share