Senior Site Reliability Engineer (SRE)
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Senior Site Reliability Engineer (SRE) at Five9. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
Join us in bringing joy to customer experience. Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.
Living our values everyday results in our team-first culture and enables us to innovate, grow, and thrive while enjoying the journey together. We celebrate diversity and foster an inclusive environment, empowering our employees to be their authentic selves.
In this SRE role, you will focus on the foundational work required to modernize our application deployments. The immediate priority is not deep application code integration, but rather tackling technical debt and enhancing our legacy Linux-based systems. This requires strong Linux system administration and problem-solving skills to ensure stability during our transition to cloud-native workflows. The software development portion of this role is centered on creating internal tools to improve system management, automate operational tasks, and build out our observability stack. Your success in this area is critical for establishing meaningful SLIs and achieving our reliability targets.
Key Responsibilities
Observability & Monitoring
-
Dashboards & Metrics: Design and implement comprehensive dashboards covering OS/platform-level and application-level monitoring, broken into primary (RED) and secondary indicators (USE).
-
Availability & Reliability: Establish and maintain SLIs, SLOs, and error budgets for the service.
-
Performance Monitoring: Build alerting systems and performance monitoring to proactively identify and resolve issues before they impact users.
-
Incident Response: Participate in on-call rotations, lead incident response efforts (including post-mortem analysis and remediation), maintain on-call routing, and assign application-level problems to engineering teams.
Infrastructure Automation & Deployment
-
CI/CD Pipeline Management: Build and optimize CI/CD pipelines for speed and resilience.
-
Infrastructure as Code: Develop and maintain infrastructure using tools like Terraform, Ansible, or similar.
-
Configuration Management: Automate system configuration and ensure consistency across environments. Implement and recommend best practices for configuration control.
Security & Compliance
-
Security Automation: Ensure security scanning systems are in place and review escalated vulnerabilities.
-
Access Control: Maintain proper authentication, authorization, and audit logging systems.
-
Compliance Reporting: Ensure systems meet regulatory and industry standards.
-
Security Incident Response: Participate in security incident response and remediation efforts.
Cost Optimization
-
Resource Management: Monitor and optimize cloud resource usage and costs.
-
Capacity Planning: Analyze usage patterns and plan for future capacity needs.
-
Cost Analysis: Provide recommendations for cost-effective architecture and resource allocation.
-
Right-sizing: Implement automated scaling and resource optimization strategies.
Common Services & Platform Engineering
-
Shared Infrastructure: Build and maintain common services (notification systems, caching layers, message queues, or third-party stacks).
-
Database Operations: Manage database reliability, performance, and scaling (where not handled by DB teams).
-
Service Mesh & Networking: Implement and maintain service discovery, load balancing, and network policies.
-
Developer Tools: Create and maintain tools and platforms that improve developer productivity and reliability.
Ready to apply?
This job is active. Apply now to get in early.