E
Embedding Vcvia Ashby
Member of Technical Staff - Efficient ML
San Francisco Bay AreaPosted 4mo ago
ML EngineerStaff+Full-time
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Member of Technical Staff - Efficient ML at Embedding Vc. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
Introducing Moonlake, AI for creating world simulations.
SCOPE OF WORK
Training efficiency
- Dataloaders, fusion, activation remat, gradient checkpointing.
- FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning.
GPU + kernel performance
- Nsight profiling, Triton/CUDA kernels, fused ops.
- Flash-attention–style speedups, sequence packing, KV-cache tricks.
Inference optimization
- Low-latency serving, continuous batching, speculative decoding.
- Quantization (GPTQ/AWQ), distillation, pruning.
Infra + reliability
- SLURM/K8s multi-node jobs, checkpoint hygiene.
- Determinism, env pinning, GPU failure handling.
We are committed to being an on-site, in-person team currently based in San Mateo
SCOPE OF WORK
Training efficiency
- Dataloaders, fusion, activation remat, gradient checkpointing.
- FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning.
GPU + kernel performance
- Nsight profiling, Triton/CUDA kernels, fused ops.
- Flash-attention–style speedups, sequence packing, KV-cache tricks.
Inference optimization
- Low-latency serving, continuous batching, speculative decoding.
- Quantization (GPTQ/AWQ), distillation, pruning.
Infra + reliability
- SLURM/K8s multi-node jobs, checkpoint hygiene.
- Determinism, env pinning, GPU failure handling.
We are committed to being an on-site, in-person team currently based in San Mateo
Ready to apply?
This job is active. Apply now to get in early.