TixelJobs
R
Rekavia Ashby

Member of Technical Staff (Data): World Models

REMOTEPosted 2mo ago
OtherStaff+Full-time#ai-lab

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against Member of Technical Staff (Data): World Models at Reka. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

YOUR CHARTER

- Data at Scale: Own the pipelines and storage systems that feed petabyte-scale multimodal datasets into model training.

- Sustainable Platforms: Build tooling and systems that are automated and efficient, enabling processing at scale and handling many small heterogeneous datasets.


REQUIRED SKILLSETS

- Data Engineering: Knowledge of Python ETL pipelines and supporting infrastructure, data formats, and storage systems at scale.

- ML Data Ops: Experience managing datasets, annotations, and data versioning for model training.

- Basic ML Knowledge: Solid grasp of ML fundamentals is essential to collaborate effectively with researchers and make sound data platform decisions.

- Agentic Engineering: Skilled at writing high-quality specifications for AI agents, while maintaining effective human review of AI-generated work.


RESPONSIBILITIES

- Design, automate, maintain, and optimize Python ETL pipelines (Spark/Ray) for large-scale multimodal data.

- Build and maintain data cataloging, lineage, quality tooling, integrity verification, access controls, and lifecycle management systems.

- Provide guidance, internal tools, and documentation to colleagues on data best practices.

- Serve as a custodian of the company’s datasets, ensuring overall data health, quality, and discoverability.


CHALLENGES YOU'LL TACKLE

- Implement high-performance, multimodal data pipelines capable of processing petabyte-scale datasets on 10,000s of CPUs and 100s of GPUs.

- Evolve data formats, storage, and processing to keep pace with cutting-edge AI advancements, while maintaining backward compatibility.

- Scale data infrastructure to handle the next order of magnitude in growth.

- At the same time, ensure the data platform flexible to rapidly handle many small heterogeneous datasets and ad hoc analytics queries.


TRAITS OF THE IDEAL CANDIDATE

- High agency and ownership: proactively picks up new work according to priority, manages their own backlog, and escalates early when priorities are unclear or deadlines are at risk.

- Takes responsibility for validating inputs end-to-end: spot-checks data, understands upstream preprocessing, and speaks up when something doesn't add up.

- Takes responsibility for ensuring outputs are correct and handed over: actively seeks sign-off from downstream consumers, communicates caveats, and ensures relevant stakeholders are aware of changes and breaking impacts.

- Cares about continuously improving pipelines, tooling, and processes so that each iteration makes the next one faster, more reliable, and easier for the team.

- Comfortable with rapid, pragmatic solutions when needed, but committed to high-quality, long-term solutions.
Share
Job Not Found | TixelJobs — Jobs at AI Companies