TixelJobs
S
Stack Aivia Ashby

AI Infrastructure Engineer

SF Office - 171 2nd, 4th floor$150K - $220K/yrPosted 8mo ago
MLOpsMid LevelFull-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against AI Infrastructure Engineer at Stack Ai. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

ABOUT THE ROLE

We’re hiring an AI Infrastructure Engineer to shape and scale the backend systems that power our AI platform. As a Series A company, your work will be foundational, enabling safe, efficient, and reliable AI workflows from end to end.


WHAT YOU’LL DO

- Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring).

- Own distributed job orchestration with Temporal and related systems.

- Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls.

- Build observability, monitoring, retries, and fault tolerance into all workflows.

- Manage infrastructure reliability, incident response, and performance.

- Develop tooling and platform infrastructure to support rapid growth.

- Partner with ML engineers to bring models to production at scale.


WHAT WE’RE LOOKING FOR

- 4+ years of backend engineering (Python is a must).

- Strong background in distributed systems, job orchestration, and task queues.

- Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must. You should know how to design systems that maximize throughput without sacrificing correctness or safety.

- Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar).

- Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets).

- Comfortable with containers & orchestration: Docker, Kubernetes.

- Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform).

- Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch.

- Track record scaling systems in startups or fast-paced environments.

- Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices.


WHY YOU’LL LOVE WORKING HERE

- Play a foundational role at a fast-growing Series A startup that is shaping the future of AI in enterprise workflows.

- Collaborate across Product, ML, and Platform teams, being the bridge between AI logic and scalable execution.

- Build infrastructure that enables real value for large enterprises: low-code, secure, and scalable AI workflows.

- Join a company that’s scaling thoughtfully and values developer experience.
Share