E
EPAM Systemsvia Indeed
Lead AI Engineer
Desde casa, ARPosted 3mo ago
ML EngineerLeadNone#langchain#llm#aws#gcp#azure#typescript
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Lead AI Engineer at EPAM Systems. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
We are building a unified GenAI Platform that helps hundreds of product teams create, test, and deploy AI agents quickly and safely. As a Lead AI Engineer, you will deliver end-to-end agents and platform modules, improve CI/CD and observability, and guide architecture and execution across major initiatives. Join us and apply today
Responsibilities
- Deliver end-to-end applications, AI agents, and platform modules that enable rapid GenAI agent creation, evaluation, and deployment
- Develop developer tooling, automate CI/CD workflows, and strengthen observability for secure delivery cycles, including evaluation frameworks, canary releases, rollbacks, and monitoring of cost and quality
- Embed secure development lifecycle and privacy-first practices, including threat analysis and least-privilege enforcement
- Collaborate with product owners, UX specialists, and subject matter experts to build user-focused solutions with measurable impact
- Apply modern LLM techniques such as retrieval-augmented generation, intelligent routing, tool integration, and evaluation to improve reliability, reduce decision time, increase trust and safety, and optimize query costs
- Provide technical leadership by defining architecture, mentoring engineers, and driving large initiatives
Requirements
- Hands-on software engineering experience of 5+ years in production environments
- At least 1 year of experience leading and supervising engineering teams
- Proven track record of shipping software independently or with small, fast-moving teams
- Practical experience taking AI agents from idea to production, including safety checks, A/B testing experiments, and iterative improvements
- Proficiency with LangChain or LangGraph, MCP, vector databases, and OpenSearch
- Solid understanding of machine learning workflows, including model training, deployment, and monitoring
- Knowledge of compliance frameworks such as SOC2 and HIPAA
- Strong analytical thinking, clear communication, and an ownership mindset
- Advanced full-stack development skills with familiarity in cloud environments such as AWS, Azure, or GCP
- Experience with CI/CD automation and Infrastructure as Code practices
- Understanding of Site Reliability Engineering concepts and operational reliability
- Background in quality assurance practices and testing methodologies
- Awareness of secure development lifecycle and privacy-first engineering approaches
- Strong TypeScript skills in real-world applications
- English proficiency at B2+ (Upper-Intermediate) or higher, written and spoken
Nice to have
- Understanding of how large language models work, their limitations, and approaches such as fine-tuning and model customization
- Experience building Retrieval-Augmented Generation (RAG) applications
Ready to apply?
This job is active. Apply now to get in early.