X
Xpengmotorsvia Greenhouse
Staff Machine Learning Engineer – Autonomous Driving Model Quantization & Deployment
Santa Clara, CA$215K - $364K/yrPosted 2mo ago
RoboticsStaff+Full-time
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Staff Machine Learning Engineer – Autonomous Driving Model Quantization & Deployment at Xpengmotors. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.
The Mission: The challenge of Vision-Language-Action (VLA) models and Foundation Models isn't just their intelligence—it's their real-time execution at the edge. We are seeking a high-caliber Staff Machine Learning Engineer to bridge the gap between massive research models and production-ready L4 autonomous driving systems. You will lead the effort to optimize and deploy our VLA models onto vehicle-grade compute platforms for our global fleet.
Key Responsibilities:
- Lead Optimization Strategy: Own the end-to-end quantization and optimization roadmap for large-scale multimodal models (Transformers, VLMs).
- Model Compression: Apply and innovate in PTQ (Post-Training Quantization), QAT (Quantization-Aware Training), and pruning techniques to fit VLA models into strict memory and power envelopes.
- Hardware-Software Co-design: Collaborate directly with model researchers to ensure architectures are "deployment-friendly" and with platform teams to influence future hardware requirements.
- Production Excellence: Develop and maintain robust, safety-critical deployment stacks in Modern C++, ensuring 24/7 stability and deterministic performance on the road.
Basic Qualifications:
- Proven Track Record: 5-8 years of experience in model deployment, quantization, or high-performance computing (HPC).
- Core Technical Skills: Mastery of Modern C++ and deep experience with CUDA or other hardware acceleration libraries.
- Deep Learning Expertise: Strong familiarity with PyTorch and deep knowledge of inference engines like TensorRT, ONNX Runtime, or TVM.
- Quantization Depth: Hands-on experience with INT8/FP8/INT4 quantization and knowledge of the unique challenges in quantizing Large Language Models (LLMs) or Transformers.
- Platform Knowledge: Solid understanding of computer architecture (Cache, Memory Bandwidth, SIMD) and experience with embedded/edge compute constraints.
- Systems Thinking: Ability to debug complex performance bottlenecks across the entire software stack.
Preferred Qualifications:
- Experience with VLA/VLM or other Foundation Model deployment.
- Background in autonomous driving, robotics, or real-time safety-critical systems.
- Contributions to open-source inference or compiler projects.
What do we provide:
- A fun, supportive and engaging environment
- Infrastructures and computational resources to support your ML model development/research.
- Opportunity to work on cutting edge technologies with the top talent in the field.
- Opportunity to make significant impact on transportation revolution by the means of advancing autonomous driving
- Competitive compensation package
- Snacks, lunches, dinners, and fun activities
The base salary range for this full-time position is $215,280-$364,320, in addition to bonus, equity
Ready to apply?
This job is active. Apply now to get in early.