TixelJobs
K
Kapture CXvia Indeed

ML Ops Engineer

KA, INPosted 2mo ago
MLOpsMid Level#python#llm#kubernetes#docker#aws#gcp#azure#ray

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against ML Ops Engineer at Kapture CX. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

Role Name: MLOps Engineer

At Kapture CX, we are looking for an MLOps Engineer in our AI/ML Team.

Who are we?

Kapture CX is a leading SaaS platform that helps enterprises automate and elevate customer experience through intelligent, AI-powered solutions. We partner with enterprises across industries to bring scalable automation and insight-driven efficiencies to their CX operations. Over a thousand clients across 18 countries have used Kapture’s products to enhance their customer experience, including Unilever, Reliance, Coca-Cola, Bigbasket, Meesho, Airtel Payments Bank and Cathay Pacific. Kapture delivers industry-specific solutions powered by AI, tailored workflows, and seamless automation.
Kapture CX is headquartered in Bangalore and we have offices in Mumbai and Delhi/NCR in India, in addition to offices in the USA, UAE, Singapore, Philippines and Indonesia.

What is this role all about?

As an MLOps Engineer, you will own the infrastructure backbone of our conversational AI platform. You will design and manage high-performance model serving systems for LLM, ASR, and TTS workloads — ensuring reliability, scalability, and low-latency performance at production scale.
This is a high-impact role where your architectural decisions will directly influence system speed, cost efficiency, and millions of real-time AI interactions.

Sounds interesting?

Here’s a more detailed description of what you will do in this role:
You will design, deploy, and maintain high-throughput, low-latency serving infrastructure for AI models across LLM, ASR, and TTS systems.
You will evaluate and select inference engines such as vLLM, SGLang, LMDeploy, or TensorRT-LLM based on workload requirements and performance trade-offs.
You will implement and optimise quantization strategies (INT8, INT4, FP8, GPTQ, AWQ, SmoothQuant) to maximise performance within compute constraints.
You will architect distributed serving strategies including tensor, pipeline, and data parallelism.
You will build containerised, reproducible deployment pipelines using Docker and Kubernetes.
You will define and monitor critical performance metrics such as TTFT, latency percentiles (P50/P95/P99), throughput, and GPU utilisation.
You will collaborate closely with ML Engineers during model handoffs to translate model requirements into production-ready infrastructure.
You will build CI/CD pipelines to enable continuous and zero-downtime model deployments.
This is a Bangalore-based role. We work five days a week from the office, as we believe in-person interactions fuel innovation and agility.

What does success look like in this role?

You build a reliable, scalable, and cost-efficient AI serving infrastructure that consistently meets performance SLAs while enabling rapid experimentation and deployment.

What would make you a good fit for this role?

Here are the basic requirements:
You have 3–4 years of hands-on experience in MLOps, AI infrastructure, or production model deployment.
You have strong practical knowledge of inference engines such as vLLM, SGLang, LMDeploy, or similar frameworks.
You have hands-on experience with GPU memory management, KV cache optimisation, and quantization techniques.
You are proficient in Python and comfortable working with containerisation and cloud environments.
You have experience with Docker, Kubernetes, and at least one major cloud provider (AWS, GCP, or Azure).

What are the most critical skills for this role?

You understand distributed systems principles and how they apply to AI inference workloads.
You can benchmark, profile, and optimise latency and throughput in real-world conditions.
You are comfortable working with monitoring tools such as Prometheus and Grafana.
You make principled architectural decisions based on performance, scalability, and cost trade-offs.
You communicate infrastructure constraints clearly to ML and product teams.

You will have an advantage if you:

You have experience with Ray for distributed orchestration and autoscaling.
You are familiar with TensorRT-LLM or hardware-optimised inference runtimes.
You have deployed ASR or TTS models in production environments.
You have experience with CUDA or low-level GPU optimisation.
You are familiar with emerging serving strategies such as continuous batching, prefix caching, or MoE architectures.

Why should you be interested?

Here’s what you will gain from this role:
Opportunity to architect and scale the infrastructure behind a next-generation conversational AI platform.
Hands-on exposure to cutting-edge inference systems and large-scale AI deployments.
Collaboration with a highly skilled AI and engineering team in a fast-growing global SaaS organization.
Strong growth opportunities with competitive compensation and performance-linked rewards.
Share