Senior Software Engineer (Data Platform, C++)
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Senior Software Engineer (Data Platform, C++) at Nebius. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
About Nebius:
Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.
Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.
Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.
The role
We’re looking for a Software Engineer with strong C++ expertise to join the team building and operating Nebius Data Platform — a distributed storage and a processing platform that acts as the company’s “source of truth” and the backbone of many internal (and some external) products.
Nebius Data Platform is a single multi-tenant ecosystem based on YTsaurus — instead of running separate HDFS/Kafka/HBase-style systems, we provide storage, compute, and analytics capabilities inside one platform.
Built on top of the open-source YTsaurus ecosystem, we run and extend our own Nebius distribution and develop significant in-house functionality (core and platform-level). We can design, implement, and roll out features end-to-end on our clusters without waiting for upstream approvals and contribute upstream when it makes sense.
At scale today, this includes~500 servers, ~20k CPU cores and ~10 PB of compressed data in our largest production cluster, supporting workloads ranging from business-critical pipelines and financial transactions to large-scale ML/LLM training datasets and compute.
What’s inside the platform
You’ll work on a system that includes (and ties together):
- Distributed Storage (Cypress): transactional semantics, tiered storage, erasure coding, replication, and strong reliability expectations.
- Compute & ETL: a cluster-wide job scheduler (tens of thousands of cores), MapReduce, YQL for SQL-like data processing, and SPYT (Spark over YTsaurus) for modern data engineering.
- Interactive analytics (CHYT): ClickHouse® instances spun up directly on compute nodes for fast SQL over data in-place.
- Dynamic Tables: low-latency NoSQL KV with distributed ACID transactions for OLTP-style workloads and feature stores.
- Orchestracto: workflow orchestration deeply integrated with the platform (Airflow-like, but platform-native).
What you’ll do
We’re looking for engineers who combine strong systems skills with product sense: understanding who uses the platform, why certain capabilities matter, and making pragmatic trade-offs to maximize impact. On our team, engineering work is expected to be connected to real users and outcomes — you’ll regularly align with internal stakeholders, clarify requirements, and help drive prioritization.
In this role, you will:
- Design and implement new functionality in YTsaurus core (C++) with production reliability in mind.
- Build and evolve platform-level capabilities: platform architecture and operating model—multi-cluster growth, shared primitives, and a consistent experience that scales with new teams and use cases.
- Improve end-to-end platform experience for internal (and external-facing) users: APIs, guardrails, debugging workflows, and automation.
- Own production quality: incident response / on-call rotation, root cause analysis, and turning learnings into durable fixes.
Example projects
- Roll out sharded YTsaurus masters (incl. Kubernetes operator support) and build automatic balancing of metadata across master cells (consensus groups) to remove control-plane bottlenecks and unlock 10–100x cluster growth.
- Make CHYT interactive SQL faster and more predictable at high load via performance work like data-skipping / min-max-style indexes and improved execution introspection.
- Turn Orchestracto into a platform product by defining the building blocks, developer experience, and governance for how teams create and share workflows.
- Scale and harden Parquet-on-S3 for native YTsaurus workloads by tackling replication/movement, consistent lifecycle semantics, and master-server metadata optimizations for performance and reliability.
- Design and ship complete, trustworthy audit trails for data changes (who/what/when) across heterogeneous storage and compute paths.
Tech stack
- Core: modern C++ (C++20, async + multithreaded primitives)
- Services & tooling: Go and Python (microservices, utilities, integration tests)
What we expect
- 5+ years of software engineering experience.
Ready to apply?
This job is active. Apply now to get in early.