TixelJobs
P
Psivia Ashby

Member of Technical Staff, Infrastructure

BostonPosted 2d ago
OtherStaff+Full-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Member of Technical Staff, Infrastructure at Psi. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

Overview

Physical Superintelligence is a stealth startup with roots at Google, NVIDIA, Harvard, Meta, MIT, Oxford, Johns Hopkins, Cambridge, and the Perimeter Institute building AI systems to discover new physics at scale. We are seeking engineers to build platform infrastructure at the intersection of computational science, AI systems, and software engineering.

Our mission is to discover and commercialize transformative physics breakthroughs at scale with artificial superintelligence, safely, verifiably, and for broad public benefit.

The last century's golden age of physics gave us transistors, lasers, and nuclear energy. We believe artificial superintelligence will unlock the next one. We're creating the infrastructure to industrialize scientific discovery and usher in this new era.

We have one product: new physics, at scale.

Role and Responsibilities

Own the full infrastructure stack end-to-end, from cloud foundations through CI/CD pipelines to production deployments. Build and operate multi-cloud infrastructure for our AI platform across GCP, AWS, and adjacent providers. Establish the infrastructure-as-code discipline at PSI: choose the tooling, design the modules, and make every research workflow, training job, and customer-facing AI product deployable through code.

Design and run the release engineering pipeline that ships code from commit to production. Every change flows through automated tests, security scans, and progressive rollouts. Fast, safe deploys are the default; long manual release cycles are not.

Operate the production infrastructure that powers our AI platform at scale: the paid API, model training jobs for our proprietary physics LLM, agentic research workflows, and customer deployments. Define and meet SLOs, build observability and alerting, schedule GPU and CPU capacity, lead incident response.

Be the leverage layer for the rest of engineering. Platform, product, security, and research engineers all depend on you for reliable cloud primitives, fast deploys, and visible production behavior. Write tools they use, not tickets they wait on.

What We're Looking For

Four or more years operating cloud infrastructure in production at companies known for engineering rigor (e.g., Stripe, Cloudflare, Datadog, Snowflake, Databricks, Google, Netflix, or comparable), at multi-cloud scale. You have written code and shipped infrastructure that paying customers, internal teams, or large user bases depend on every day.

Deep fluency with infrastructure as code (Terraform, Pulumi, or comparable), CI/CD systems, Kubernetes, and major cloud platforms (GCP and AWS at minimum). You have built and operated multi-cloud production deployments end-to-end, from initial cloud setup through to release pipelines.

Machine learning and training-workload operations experience: GPU scheduling, distributed training infrastructure, model-serving pipelines, observability for ML systems. You have run production training jobs and shipped served-model surfaces.

Operational excellence and on-call discipline. You have led incidents, written runbooks, reduced toil with code, and built systems that scale without bureaucracy. You favor self-service abstractions over tickets and visibility over heroics.

Nice to Have

Built CI/CD or release engineering pipelines from scratch at a fast-growing company.

Hands-on with model serving infrastructure such as vLLM, Triton, or comparable.

Production observability with OpenTelemetry, Prometheus, Grafana, or comparable.

Background in scientific computing, HPC, or research compute environments.

How We Work

We are engineering-led. Engineers own problems end-to-end, from spec to ship to on-call. We write contracts before logic, test against real systems instead of mocks, and favor simple designs that ship over clever ones that do not. Our development process is AI-native: engineers work with agentic coding tools daily, write specs that are legible to humans and agents alike, and lead with leverage.

Location and Compensation

This is an in-person role based in Boston or San Francisco. We offer competitive compensation including salary, benefits, and meaningful early-stage equity. We evaluate on technical breadth, systems thinking, scientific curiosity, and shipping velocity. We are an equal opportunity employer and value diverse perspectives in building platforms for AI-driven discovery.
Share