TixelJobs
C
Cerebras Systemsvia Greenhouse

Principal Engineer, Inference Cloud

Sunnyvale, CAPosted 1mo ago
OtherStaff+Full-time#ai-lab

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Principal Engineer, Inference Cloud at Cerebras Systems. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.  

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. 

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Location: Sunnyvale 

We're hiring a Principal Engineer for our Inference Cloud Platform. This team owns the cloud layer behind our Inference Service, including availability, latency, reliability, and multi-region scale. 

This is one of the most senior IC roles on the team, for someone who can identify the highest-leverage platform problems, set direction across multiple teams, define long-term architecture, and write production code on critical paths. 

Many of the key decisions are ambiguous at the outset; you’ll need to frame the problem, make tradeoffs, and drive execution without a clear spec. 

The scope includes multi-region traffic architecture, graceful degradation under bursty AI workloads, high-QPS performance, and the operating model for a platform that needs to remain fast and available under changing demand. You'll partner closely with ML, Product and Infrastructure teams. 

Responsibilities 

  • Problem Definition & Prioritization. Identify the most important technical problems for the platform, often before there's a clear ask. Make explicit tradeoff decisions about what the platform will and won't support, with reasoning that holds up under scrutiny from senior engineering leadership. 
  • Platform Direction. Set the long-term technical direction for the Inference Cloud Platform, including multi-region topology, failure domains, service boundaries, and system evolution over time. 
  • Reliability & Performance. Architect active-active systems with rapid failover and graceful degradation (circuit breaking, backpressure, load shedding) with clear SLOs. Drive improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand. 
  • Code & Design Reviews. Contribute production code in critical paths, review designs and implementations, and make architectural decisions including build-vs-buy tradeoffs with long-term operational consequences.
    Share