Featherlessaivia Ashby

AI Researcher — Inference Optimization

REMOTEPosted 3mo ago

ResearchMid LevelFull-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against AI Researcher — Inference Optimization at Featherlessai. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

ROLE OVERVIEW

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

KEY RESPONSIBILITIES

- Research and develop techniques to optimize inference performance for large neural networks.

- Improve latency, throughput, memory efficiency, and cost per inference.

- Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).

- Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).

- Benchmark inference workloads across hardware accelerators.

- Collaborate with engineering teams to deploy optimized inference pipelines.

- Translate research insights into production-ready improvements.

REQUIRED QUALIFICATIONS

- Strong background in machine learning, deep learning, or AI systems.

- Hands-on experience optimizing inference for large-scale models.

- Proficiency in Python and modern ML frameworks (e.g., PyTorch).

- Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).

- Ability to design experiments and communicate results clearly.

PREFERRED / NICE-TO-HAVE QUALIFICATIONS

- Experience deploying production inference systems at scale.

- Familiarity with distributed and multi-GPU inference.

- Experience contributing to open-source ML or inference frameworks.

- Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.

- Experience working close to hardware (CUDA, ROCm, profiling tools).

WHAT SUCCESS LOOKS LIKE

- Measurable gains in latency, throughput, and cost efficiency.

- Optimized inference systems running reliably in production.

- Research ideas successfully translated into deployable systems.

- Clear benchmarks and documentation that inform product decisions.

RELEVANT RESEARCH AREAS (BONUS)

- Long-context inference optimization

- Speculative decoding

- KV-cache compression and paging

- Efficient decoding strategies

- Hardware-aware inference design

Ready to apply?

This job is active. Apply now to get in early.

Similar Jobs

Research Scientist, AI/ML Biologics - Methods Development - Method

TakedaPharmaceutical Nordics AB

Research Scientist, Quantum Computing and AI - New College Grad 2026

NVIDIA

Language Research Scientist

Research Scientist

Gadget Shop

View all jobs