Platform Engineer, Data
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Platform Engineer, Data at Allencontrolsystems. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
Allen Control Systems (ACS) is a cutting-edge defense startup founded by two ex-Navy electrical engineers with a proven track record in robotics and software. We are developing a small, autonomous gun turret that employs advanced computer vision and control systems to precisely target and neutralize small drones and loitering munitions. Our innovative approach requires overcoming significant technical challenges, making this an exciting and dynamic environment for experienced engineers.
With an engineering-first culture, ACS values technical excellence and continuous learning. Backed by our founders' successful exits from two previous ventures acquired for a combined $180M in 2022, we are committed to ensuring that the groundbreaking technologies we develop have a real-world impact.
Position Overview:
We are seeking a Data Platform Engineer who combines expert-level data infrastructure skills with a strong knowledge of AI & Machine Learning principles. In this role, you will go beyond simple data validation scripts; you will apply your understanding of model training dynamics to design and implement existing and novel approaches to optimize our datasets.
You will build and maintain large-scale image and video pipelines, but with a focus on data curation strategies—such as coreset selection, embedding-based filtering, and automated complexity scoring. You’ll partner closely with our ML engineers to orchestrate ingestion, synthetic data generation, and versioned releases, ensuring that every dataset is not only high-integrity and available but strictly optimized to maximize model performance.
What You’ll Do:
- Design and develop a scalable data infastructure, focusing on organization and curation to support continuing increases in data volume and complexity
- Design and implement existing and novel approaches to optimize datasets for model training (e.g., hard example mining, class balancing, de-duplication, embedded-based filtering).
- Support the data infrastructure required for optimal ingestion, transformation, and storing of datasets
- Develop and use synthetic data generation workflows to create realistic synthetic training data for computer vision models.
- Design and own end-to-end image and video pipelines for computer vision model training: multi-source ingestion, QA and visualization, standardization, and organization.
- Coordinate collection of real-world data; coordinate label creation and QA with labelers.
- Develop and use data quality tooling: metrics for balance, drift, and annotation error; active-learning sampling to target gaps; feedback loops from production back to curation.
Ready to apply?
This job is active. Apply now to get in early.