Exacteravia Greenhouse

Principal AI Engineer

REMOTEPosted 2d ago

ML EngineerStaff+Full-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Principal AI Engineer at Exactera. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

Exactera has offices in New York City, Tarrytown NY, San Diego, CA, London, and Argentina.

The Role

As Principal AI Engineer, you will own the inference and intelligence layer of the Exactera platform. You will build the substrate that agentic workflows run on: the domain knowledge graph and structured representations of expert reasoning, the hybrid retrieval that operates over them, model serving and LLM integration, the agent platform, the evaluation harness that captures expert judgment as ground truth, and the interfaces that product engineers compose into customer-facing workflows.

You report directly to the CTO and work closely with the data engineering team (close collaboration on data structure, entity resolution, and the contract between the lakehouse and the intelligence layer), product engineering (who consume your interfaces to build agentic workflows), product management (who set product direction and prioritize the workflows the platform supports), and domain experts in tax advisory (who provide the ground truth and judgment the AI systems are evaluated against — and whose decisions you will help capture as first-class data).

This is an individual contributor role with architectural authority over the AI/ML platform. You will set technical direction, make build-versus-buy decisions, and provide direction to other senior engineers working on platform components. You will have meaningful input on stack evolution and are expected to evaluate the stack against real workloads and propose changes when warranted.

What You Will Build

We have made initial choices you will inherit and refine. The data platform runs on Databricks with Unity Catalog for governance. We use MLflow for experiment tracking and model lifecycle. Our LLM integrations use Anthropic and OpenAI APIs. MCP is our current pattern for exposing capabilities to agentic workflows, with production gateways already in service. We are on AWS, with Terraform for infrastructure-as-code. Exact tool experience matters less than having strong, defensible opinions about the categories.

Knowledge and Reasoning Substrate

This is the headline of the role. Tax-advisory work is relationship-heavy: companies own subsidiaries, subsidiaries transact, transactions map to jurisdictions, jurisdictions have regulatory frameworks, and comparable selections are justified against multi-dimensional functional profiles. A vector store can suggest candidates; it cannot defend a selection under audit. The substrate you build is what makes the rest of the platform compliance-grade.

Domain ontology and knowledge graph. Design and operate the typed graph of tax-domain entities and relationships — companies, transactions, jurisdictions, segments, functional profiles, comparability factors, expert decisions, and reports — with relationships reified (e.g., comparable-to as an edge carrying which dimensions, who weighted them, in which report, with what outcome). Stack choice is yours.
Entity resolution and relationship extraction at scale. Pipelines that resolve entities across 34,000 reports, SEC/EDGAR filings, and customer data into a canonical, versioned representation. Relationship extraction from free text into structured edges. Reconciliation against curated reference data (NAICS, ownership filings).
Expert judgment as first-class data. Capture practitioner decisions — selections, rejections, weightings, and the reasoning behind them — as structured entities attached to the graph, versioned and queryable. This is Exactera's proprietary moat encoded.
Versioned graph snapshots. The graph evolves; audit defense requires being able to reconstruct exactly what it looked like at any point in time. Design the versioning and snapshot strategy.

Hybrid Retrieval

Retrieval is three modes — graph traversal, vector search, and structured queries — and a planner that decides which combination to use per task. "Find comparable companies for this intercompany loan" is a different retrieval shape than "summarize prior treatment of intercompany IP licensing in EMEA." The retrieval layer makes the right combination of moves automatically.

The full RAG pipeline, as one retrieval mode within the larger system: chunking strategies, embedding generation, index management, retrieval optimization, and context assembly for LLM consumption. Embedding pipelines for heterogeneous data, with index maintenance as source data and the graph evolve.
Graph-enhanced retrieval. Graph traversal for relationship-aware lookups, graph-guided chunking that respects entity boundaries, graph context assembly that pulls in related entities and prior precedents alongside narrative text.
Structured retrieval. First-class structured queries (jurisdiction, year, industry code, transaction value range) as a peer mode, not an afterthought.
Retrieval planning and orchestration. The component that chooses, for a given task or sub-task, which retrieval modes to use, in what order, and how to fuse results. This is its own non-trivial design problem.
Retrieval feedback loops. Every retrieved-and-used result is evidence; the substrate compounds over time.

Reasoning, Memory, and Agent Platform

Exactera's products operate in regulated, high-stakes domains where AI outputs have to be defensible to tax authorities. The bar is compliance-grade AI: systems where errors carry real consequences and outputs have to hold up under audit. You will build the platform that lets agentic workflows operate at this standard:

Agent memory infrastructure: short-term (conversation context), long-term (the graph itself), and episodic (per-customer history). The patterns that let agents operate across sessions without losing state or reasoning.
Tool access and permissioning: the layer that gives agents controlled, auditable access to data and capabilities, with constraints appropriate to the action being taken. Integrates with the platform's existing tenant and grant model.
Determinism and audit trail, end-to-end. Versioned graph snapshots, versioned indexes, versioned prompts, versioned model snapshots, versioned expert-judgment datasets — all coherent enough to replay exactly what the system saw when it made a specific decision. This is not just logging; it is replayability for audit defense.
Observability: tracing, logging, and monitoring for agent execution, with the ability to reconstruct why an agent made a specific decision.

Evaluation and Expert Judgment as Ground Truth

Evaluation is a pillar, not a sub-bullet. For compliance-grade AI, it is how you safely change anything in the stack, how you measure whether the system is actually replicating expert judgment, and how you defend outputs after the fact.

Ground-truth capture. Design the system that turns practitioner decisions into versioned, queryable evaluation data — labels, rationales, weights, outcomes, dissents.
System-level evaluation. Frameworks that exercise retrieval quality (precision/recall against expert-labeled relevant sets), decision quality (does the system pick what the expert picked, and if not, why?), justification quality (does the rationale survive expert review?), and end-to-end (does the synthesized output match a defensible expert report?).
Regression detection in CI and in shadow against production, with the operational discipline to act on it.

Model Serving and Inference Infrastructure

Production model serving: real-time inference endpoints for classification, extraction, and decision-support models, with latency, cost, and reliability SLAs.
Experiment tracking, model registry, and deployment lifecycle in production environments.
Structured extraction at scale, as a first-class workload: extracting entities, transactions, comparable sets, and reasoning patterns from semi-structured and unstructured documents into the knowledge substrate above.

LLM Integration and Cost Management

Production patterns for LLM API integration: cost optimization, token management, prompt caching, rate limiting, fallback routing, and observability.
Cost models that scale sublinearly with corpus growth. Decisions about

Ready to apply?

This job is active. Apply now to get in early.

Similar Jobs

Machine Learning Engineer

HR Ashwini k

Principal AI Engineer

Nxt Level

AI Engineer, AI Transformation

Idinsight

Machine Learning Engineer

Cisco

View all jobs