Harnessincvia Greenhouse

Staff Cloud Engineer

REMOTEPosted 2mo ago

devopsStaff+Full-time#remote

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Staff Cloud Engineer at Harnessinc. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

Harness is the AI Software Delivery Platform company, led by technologist and entrepreneur Jyoti Bansal (founder of AppDynamics, acquired by Cisco for $3.7B). Harness has raised approximately $570M in funding and is valued at $5.5B, backed by leading investors including Goldman Sachs, Menlo Ventures, IVP, Unusual Ventures, Citi Ventures, and more. As AI accelerates code creation, the real bottleneck has shifted to everything after the code – testing, deployments, application security, reliability, compliance, and cost optimization. Harness brings AI and automation to this “outer loop,” helping teams ship software faster while maintaining security and governance throughout the entire software delivery lifecycle.

Powered by Harness AI and the Software Delivery Knowledge Graph, the Harness Platform applies deep context and intelligent automation across the software delivery lifecycle with governance and policy-driven controls embedded throughout the platform.

Over the past year, Harness powered over 185M deployments, 82M builds, 18T flag evaluations, 8M security scans, 9.1B optimized tests, 3T protected API calls, and helped manage $2.8B in cloud spend — enabling customers like United Airlines, Morningstar, and Choice Hotels to accelerate releases by up to 75%, reduce cloud costs by up to 60%, and achieve 10x DevOps efficiency.

With a global team across 26 offices and 25 countries, Harness is shaping the future of AI software delivery — and we’re looking for exceptional talent to help us move even faster.

Position Summary

As a Staff Cloud Engineer at Harness, you will play a pivotal role in designing, building, and maintaining our cloud infrastructure. You will be responsible for ensuring the reliability, scalability, and performance of our systems, incorporating a blend of Cloud Engineering and Site Reliability Engineering (SRE) practices. This role requires a strong technical background, a passion for innovation, and the ability to work collaboratively in a fast-paced environment.

Key Responsibilities

Cloud Infrastructure, Distributed Systems & Platform Engineering:

Design, build, and manage scalable, secure, and reliable cloud infrastructure using GCP, AWS or Azure.
Develop infrastructure-as-code using tools such as Terraform, CloudFormation, or similar.
Lead the design and evolution of scalable, secure, multi-tenant, multi-region cloud platforms across AWS, GCP, and Azure.
Architect and build control planes, orchestration systems, and shared platform services used across teams.
Design and operate highly available, fault-tolerant, and self-healing distributed systems at scale.
Define and enforce SLO-driven architectures, reliability standards, and resilience strategies.
Drive infrastructure-as-code and platform abstractions to standardize and simplify deployments.
Own capacity planning, scalability strategy, and performance optimization.

Observability & Operational Excellence

Establish and scale monitoring, logging, and alerting frameworks for proactive issue detection.
Lead incident response, root cause analysis, and continuous reliability improvements.
Drive system-wide performance, scalability trade-offs, and efficiency optimizations.

Site Reliability Engineering (SRE):

Implement SRE practices to ensure the reliability, availability, and performance of cloud services.
Develop and maintain monitoring, logging, and alerting systems to detect and address issues proactively.
Perform capacity planning and demand forecasting to ensure system scalability and performance.

Automation & CI/CD:

Architect and scale Kubernetes-based platforms and deployment systems (Helm, orchestration).
Design and implement CI/CD pipelines and platform-level automation frameworks.
Enable reliable, repeatable, and efficient delivery of applications and infrastructure.

Security & Compliance:

Define and enforce security best practices and compliance standards across infrastructure and platforms.
Lead security reviews, risk assessments, and continuous improvement initiatives.

Leadership & Collaboration :

Provide technical leadership across teams, driving architecture consistency and long-term strategy.
Mentor engineers and elevate engineering standards, best practices, and design rigor.
Partner with product and engineering stakeholders to align platform strategy with business goals.

About You

Technical Expertise:

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
8+ years of experience in cloud engineering, site reliability engineering, or related roles.
Strong experience with cloud platforms (AWS, GCP, Azure) and cloud-native services.
Proficiency in infrastructure-as-code tools (Terraform), Helm package manager and configuration management tools (Ansible, Chef, Puppet).
Experience with AI-OPS.

SRE Practices:

Experience with SRE principles, including error budgets, SLIs, SLOs, and incident management.
Strong knowledge of monitoring and observability tools (Prometheus, Grafana, GCM).

Automation & DevOps:

Expertise in building and managing CI/CD pipelines using tools like Jenkins, GitLab CI, CircleCI or Harness.
Strong coding skills (Python, Go, etc.) and familiarity with version control systems (Git).

Security & Compliance:

Understanding of security best practices for cloud infrastructure and applications.
Share

Ready to apply?

This job is active. Apply now to get in early.

Similar Jobs

DevOps Engineer

Vacancies

DevOps Engineer- Late Shift(2 PM - 10PM)

Nice

Lead DevOps Engineer- Late Shift(2 PM - 10PM)

Nice

Senior AI Platform Engineer

Hirehangar

View all jobs