Senior AI Engineer: Agentic Systems & On-Premise Infrastructure
About the Role
The Mission
While most of the industry is building wrappers around closed-source APIs, we are building Sovereign AI. We design, deploy, and scale sophisticated agentic systems on our own private infrastructure. We are looking for a Senior AI Engineer who moves past "prompt engineering" to architect complex, stateful workflows where data security and hardware efficiency are the primary constraints. If you believe the future of AI belongs to those who own the weights and the silicon, you belong here.
The Role
As a Senior AI Engineer, you will own the end-to-end lifecycle of our internal AI ecosystem—from optimizing local inference engines to orchestrating multi-agent swarms. You will build systems that don't just "chat," but reason, execute tools, and maintain state over long-running business processes. This is a role for a "full-stack" AI engineer: someone who understands both the mathematical foundations of LLMs and the systems engineering required to run them at scale on-premise.
Core Responsibilities
- Agentic Orchestration: Architect multi-agent systems using LangGraph or PydanticAI, implementing advanced patterns like Plan-and-Execute, Self-Reflection, and Multi-Agent Handoffs.
- Production RAG: Build high-performance Retrieval-Augmented Generation pipelines utilizing hybrid search, cross-encoders, and re-ranking within our private vector stores.
- On-Premise Optimization: Manage the full inference stack. You will be responsible for optimizing model performance (latency vs. throughput) on local GPU clusters using vLLM, TGI, or TensorRT-LLM.
- Stateful Architecture: Design resilient workflows that handle long-running tasks, error recovery, and complex "human-in-the-loop" interactions.
- Evaluation & Benchmarking: Develop rigorous, automated "LLM-as-a-judge" frameworks to evaluate agentic performance, ensuring reliability and grounding.
Technical Requirements
- Agentic Frameworks: Expert-level mastery of LangGraph (state machines/persistence) or PydanticAI (type-safe logic/structured outputs).
- Python Mastery: Deep experience in asynchronous Python and building scalable, production-grade backend services (FastAPI, Pydantic).
- Infrastructure & Deployment: Proven experience deploying open-weights models (Llama, Mistral, DeepSeek) on bare-metal or private Kubernetes environments.
- The "Inner Workings": A strong grasp of LLM fundamentals, including attention mechanisms, KV caching, and the impact of tokenization on performance/cost.
Nice to Have (The Extra Mile)
This role is specifically for engineers who enjoy the challenge of hardware constraints and model ownership:
- Model Training & Refinement: Experience with the full training lifecycle:
- Pretraining: Knowledge of data curation, pruning, and continued pretraining on domain-specific datasets.
- Post-training: Hands-on experience with Supervised Fine-Tuning (SFT) and alignment techniques like DPO, ORPO, or RLHF.
- Model Compression & Efficiency: Deep knowledge of quantization (GGUF, AWQ, EXL2), model distillation, and graph optimization to squeeze maximum performance out of local hardware.
- VRAM Management: Ability to perform "VRAM math"—calculating memory requirements for various model sizes (e.g., 70B vs 8B) relative to context window length, batch sizes, and KV cache pressure.
- Data Sovereignty: A passion for building private systems where data privacy and infrastructure control are paramount.
Job Type: Full-time
Pay: $5,000.00 - $7,000.00 per month
Benefits:
- Paid time off
Work Location: In person