Z
Zyphravia Ashby
Data Engineer - Multimodal Systems
San FranciscoPosted 1mo ago
Data EngineerMid LevelFull-time
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Data Engineer - Multimodal Systems at Zyphra. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
ZYPHRA IS AN ARTIFICIAL INTELLIGENCE COMPANY BASED IN SAN FRANCISCO, CALIFORNIA.
THE ROLE:
As a Data Engineer - Multimodal Systems, you will be a core contributor to creating, collecting, and improving Zyphra’s datasets and data pipelines across a variety of modalities. Your work will intersect with almost every team at Zyphra. You will be involved in collecting large-scale datasets and implementing and optimizing highly parallel data pipelines.
YOU’LL WORK ACROSS:
- Large-scale data collection across a variety of modalities (text, audio, image)
- Designing and working with highly efficient, parallelized data processing pipelines across modalities
- Designing and running rigorous experimental ablations to demonstrate the impact of new data improvements
WHAT WE'RE LOOKING FOR / REQUIREMENTS:
- Strong implementation and prototyping ability
- Can take an idea from conception to experimentation quickly
- The ability to work well with others in a high-paced research setting
- Can rapidly learn new fields and are excited to implement new ideas
- Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale.
QUALIFICATIONS / ADDITIONAL SKILLS:
- Experience collecting, handling, and processing large datasets
- Experience with parallel Python programming frameworks such as Dask
- Understanding of the state-of-the-art in dataset curation across modalities
- A generally meticulous nature and a strong interest in actually looking at data and sanity checking things
- Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
- Understanding of and interest in large-scale, highly parallel data processing pipelines.
- Proficiency with PyTorch and Python.
- Experience contributing to large pre-existing codebases and rapidly getting up to speed.
- Previously published machine learning research in well-respected venues.
- Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)
WHY WORK AT ZYPHRA:
- Our research methodology is grounded in methodical, step-by-step approaches to ambitious goals. Both deep research and engineering excellence are equally valued
- We strongly value new and crazy ideas and are very willing to bet big on new ideas
- We move as quickly as we can; we aim to minimize the bar to impact as low as possible
- We all enjoy what we do and love discussing AI
BENEFITS AND PERKS:
- Comprehensive medical, dental, vision, and FSA plans
- Competitive compensation and 401(k) plan
- Relocation and immigration support on a case-by-case basis
- In-office snacks and meals provided
- Unlimited PTO and company holidays
- In-person team in San Francisco with a collaborative, high-energy environment
THE ROLE:
As a Data Engineer - Multimodal Systems, you will be a core contributor to creating, collecting, and improving Zyphra’s datasets and data pipelines across a variety of modalities. Your work will intersect with almost every team at Zyphra. You will be involved in collecting large-scale datasets and implementing and optimizing highly parallel data pipelines.
YOU’LL WORK ACROSS:
- Large-scale data collection across a variety of modalities (text, audio, image)
- Designing and working with highly efficient, parallelized data processing pipelines across modalities
- Designing and running rigorous experimental ablations to demonstrate the impact of new data improvements
WHAT WE'RE LOOKING FOR / REQUIREMENTS:
- Strong implementation and prototyping ability
- Can take an idea from conception to experimentation quickly
- The ability to work well with others in a high-paced research setting
- Can rapidly learn new fields and are excited to implement new ideas
- Excellent communication and collaboration skills, and can work effectively on both research and engineering implementation at scale.
QUALIFICATIONS / ADDITIONAL SKILLS:
- Experience collecting, handling, and processing large datasets
- Experience with parallel Python programming frameworks such as Dask
- Understanding of the state-of-the-art in dataset curation across modalities
- A generally meticulous nature and a strong interest in actually looking at data and sanity checking things
- Strong grasp of proper experimental methodology for running rigorous ablations and other hypothesis testing
- Understanding of and interest in large-scale, highly parallel data processing pipelines.
- Proficiency with PyTorch and Python.
- Experience contributing to large pre-existing codebases and rapidly getting up to speed.
- Previously published machine learning research in well-respected venues.
- Postgraduate degree in a scientific subject (Computer Science, EE/EECS, Mathematics, Physics, Machine Learning)
WHY WORK AT ZYPHRA:
- Our research methodology is grounded in methodical, step-by-step approaches to ambitious goals. Both deep research and engineering excellence are equally valued
- We strongly value new and crazy ideas and are very willing to bet big on new ideas
- We move as quickly as we can; we aim to minimize the bar to impact as low as possible
- We all enjoy what we do and love discussing AI
BENEFITS AND PERKS:
- Comprehensive medical, dental, vision, and FSA plans
- Competitive compensation and 401(k) plan
- Relocation and immigration support on a case-by-case basis
- In-office snacks and meals provided
- Unlimited PTO and company holidays
- In-person team in San Francisco with a collaborative, high-energy environment
Ready to apply?
This job is active. Apply now to get in early.