TixelJobs
X
Xpengmotorsvia Greenhouse

Staff Software Engineer - AI Infrastructure

Santa Clara, CA$179K - $304K/yrPosted 1mo ago
MLOpsStaff+Full-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Staff Software Engineer - AI Infrastructure at Xpengmotors. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.

 
About the Role
We are looking for a versatile Machine Learning Infrastructure Engineer to join XPeng’s Fuyao AI Platform team — a core AI infrastructure powering autonomous driving, robotics, and intelligent cockpit applications. You will build and optimize next-generation AI infrastructure, spanning dataloader, dataset and data production systems, large-scale inference, and distributed compute platforms — with a strong focus on efficiency, scalability, and reliability.
 
Job Responsibilities
  • Contribute to one or more of the following areas:
    • Design and optimize large-scale data processing, production and loading pipelines, supporting heterogeneous data types (images, videos, point clouds, sensor streams, etc.).
    • Build and maintain high-performance dataset management and loading frameworks, ensuring low-latency, high-throughput pipelines for training and inference.
    • Develop and optimize distributed compute and inference systems, including scheduling, resource utilization, and performance tuning.
  • Collaborate with cross-functional teams (e.g. Algorithms, Data Lakehouse) to translate requirements into production-ready infrastructure solutions.
  • Continuously monitor, profile, and eliminate bottlenecks across AI data, inference and compute stack.

 

Basic Qualifications

  • Master’s degree in Computer Science, Software Engineering, or equivalent experience.
  • 5+ years of experience in large-scale data processing or ML infrastructure.
  • Proficient in Python with solid software engineering fundamentals, clean coding practices, and strong debugging skills.
  • Hands-on experience with relational databases and NoSQL systems, including metadata and cache management; prior experience with large-scale VectorDB is highly desirable.
  • Familiarity with Linux file systems and network I/O optimization for distributed or object storage.
  • Strong communication skills and ability to work cross-functionally in fast-paced environments.
  • Strong ability to learn quickly, adapt to new challenges, and proactively explore and adopt new technologies.  
 
Preferred Qualifications
  • Familiarity with the autonomous driving industry and enthusiasm for its challenges.
  • Experience with distributed computing frameworks such as Ray, Flink or Spark.
  • Experience in building and scaling ML infrastructure in cloud-native environments.
  • Experience in any of the following areas:
    • Large-scale deep learning training or inference optimization focused on scalability and model acceleration.
    • Columnar storage formats (Parquet/ORC) and related ecosystems, including partitioning, compression, and vectorized I/O optimization.
    • Large-scale data loading frameworks (PyTorch Dataloader, Hugging Face Datasets).
 
The base salary range for this full-time position is $179,400-$303,600, in addition to bonus, equity an
Share