Lead Technical Architect, AI Infrastructure (Req#1036)
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Lead Technical Architect, AI Infrastructure (Req#1036) at Eplusinc. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
Overview
ePlus is seeking a Lead Technical Architect, AI Infrastructure with strong hands-on expertise in enterprise infrastructure design, deployment, and delivery. This role will lead the implementation of next-generation data center solutions that support AI, HPC, and cloud workloads — spanning compute, networking, and storage.
You will be responsible for architecting, building, and automating GPU-accelerated, virtualized, and containerized environments, ensuring performance, scalability, and operational excellence across hybrid infrastructures.
Your Impact
Key Responsibilities
- Design, deploy, and support NVIDIA DGX, HGX, or GPU-based systems within customer environments.
- Install and configure GPU platforms, including drivers, firmware, and management tools
- Configure high-speed networking (InfiniBand, Ethernet, VLANs) and validate performance
- Provision compute nodes, configure OS images, and automate deployments
- Implement and manage virtualization platforms (VMware ESXi, vCenter, vSAN, NSX) and hyperconverged infrastructure
- Build and administer containerized platforms using Kubernetes (RKE, OpenShift, EKS, AKS, GKE)
- Integrate storage systems and ensure high-performance data access for workloads
- Implement infrastructure automation and assist with configuration management
- Collaborate with cross-functional teams — networking, DevOps, storage, and application owners — to ensure smooth project delivery.
- Troubleshoot and optimize system performance across compute, network, and storage layers.
- Provide technical leadership and documentation for customer deployments and internal delivery teams
- Travel expected – <15 % - For customer design sessions and presentations
Qualifications
- 6+ years of experience in data center architecture, infrastructure delivery, or systems engineering roles
- Working experience with GPU platforms and GPU Drivers
- Hands-on experience with Linux systems, virtualization, and Kubernetes
- Strong networking knowledge including InfiniBand, Ethernet, VLANs, and RDMA
- Experience with automation tools and scripting (e.g., Ansible, Terraform, Bash, Python)
- Understanding of container orchestration and distributed workloads across multiple distributions
- Excellent troubleshooting, documentation, and customer-facing communication skills.
- Ability to deliver complex projects independently, on time, and in coordination with remote teams.
Position Specifics
The initial base salary range for this position is expected to be between $125,000 and $170,000 annually. The final base salary offered will be determined by multiple factors, including, but not limited to, job-related knowledge, depth of experience, skills, certifications, and geographic location. In addition to base salary, our compensation package may include other components such as commissions and discretionary bonuses.
ePlus offers a full range of medical, financial, and/or other benefits (including 401(k) eligibility, employee stock purchase program and various paid time off benefits, such as vacation, sick time, and personal leave), dependent on the position offered. Details of participation in these benefit plans will be provided if an offer of employment is extended.
If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
Ready to apply?
This job is active. Apply now to get in early.