E
Embedding Vcvia Ashby
Member of Technical Staff - ML Infrastructure & Performance
San Mateo, CAPosted 5mo ago
MLOpsStaff+Full-time
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Member of Technical Staff - ML Infrastructure & Performance at Embedding Vc. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
Introducing Moonlake, AI for creating real-time interactive content
Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions.
Scope of Work:
- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.
- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.
- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.
- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.
- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.
Tech signals:
Previous experience at Infra-heavy startups such as Databricks, Roblox
We are committed to being an on-site, in-person team currently based in San Mateo
Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions.
Scope of Work:
- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.
- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.
- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.
- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.
- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.
Tech signals:
Previous experience at Infra-heavy startups such as Databricks, Roblox
We are committed to being an on-site, in-person team currently based in San Mateo
Ready to apply?
This job is active. Apply now to get in early.