Genesis Aivia Ashby

Member of Technical Staff, Training (Bay Area, Remote)

Bay AreaPosted 1mo ago

OtherStaff+Full-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Member of Technical Staff, Training (Bay Area, Remote) at Genesis Ai. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

WHAT YOU’LL DO

- Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels

- Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization

- Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks

- Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking

- Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures

WHAT YOU’LL BRING

- Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)

- Production-grade expertise in Python

- Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization

- Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism

- System-level mindset with a track record of tuning hardware–software interactions for maximum utilization

Ready to apply?

This job is active. Apply now to get in early.

Similar Jobs

Engineering Lead

Caribou

Senior Software Engineer, Hub Team

Weekend (fmr. Volley)

AI Architect

Avahi

Senior Engineering Manager

Just Appraised

View all jobs