AI Infrastructure Operations Engineer
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against AI Infrastructure Operations Engineer at Privatehealthmanagement. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
AI Infrastructure & Operations Engineer
Location: Remote (U.S.)
Reports To: Juan Sandoval-Tobias
About Private Health Management
Private Health Management guides patients and families through the chaos and complexity of serious medical care, for cancer and beyond. As an independent advocate and trusted partner during healthcare's most challenging moments, our team of experts works alongside physicians to orchestrate the best treatment, advocacy, and support to deliver measurably better outcomes.
Everyone deserves great care. We make sure they get it.
About the Role
PHM is building and scaling Companion, an AI-enabled clinical platform operating in a high-trust healthcare environment where reliability, observability, and security are foundational requirements. The platform includes headless AI agents designed to support clinical and operational professionals by acting as intelligent workstations that integrate with enterprise applications and workflows.
The AI Infrastructure & Operations Engineer will operationalize the platform so it runs reliably at production scale, helping ensure the systems behind Companion are observable, recoverable, secure, and maintainable as adoption grows.
This role sits at the intersection of Kubernetes operations, AI platform reliability, observability engineering, and operational security. You will help evolve and maintain the Azure-based infrastructure stack while partnering closely with technology leadership, AI architects, and security stakeholders. This is a high-ownership role for someone who thrives in fast-moving environments, is comfortable operating with incomplete information, and enjoys building operational discipline around emerging AI systems.
What You’ll Accomplish
- Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.
- Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents.
- Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.
- Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge.
Ready to apply?
This job is active. Apply now to get in early.