A
Apiphanyvia Ashby
Associate Data Scientist
REMOTEPosted 2mo ago
Data ScientistEntry LevelFull-time
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Associate Data Scientist at Apiphany. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
ROLE OVERVIEW
We are seeking an Associate Data Scientist to support AI/ML engineering efforts by preparing, validating, and structuring data for LLM-driven systems. This is a hands-on role focused on real-world data processing, pipeline support, and model evaluation.
KEY RESPONSIBILITIES
- Process and clean structured and unstructured data for AI/ML pipelines.
- Prepare training-ready datasets for LLM fine-tuning and evaluation workflows.
- Support RAG and NL→SQL systems through data preparation and validation.
- Perform data quality checks and ensure completeness and consistency.
- Assist in building and maintaining data pipelines and APIs (e.g., FastAPI).
- Collaborate with engineering teams to troubleshoot and optimize data workflows.
REQUIRED SKILLS
- 2+ years of experience in data processing or data-focused roles.
- Strong Python skills with experience in data libraries (Pandas, NumPy, Scikit-learn).
- Experience supporting LLM workflows (fine-tuning, prompt engineering, evaluation).
- Familiarity with structured (SQL) and unstructured text data.
- Understanding of data preparation for AI/ML systems.
NICE TO HAVE
- Exposure to RAG pipelines, embeddings, or evaluation metrics.
- Experience with ML frameworks (PyTorch/TensorFlow) and Docker-based workflows.
- Experience with CI/CD pipelines for ML systems.
- Familiarity with vector databases (e.g., Chroma) and reranking techniques.
- Research exposure to Transformer-based architectures.
Note: This position is open to candidates residing in India only.
We are seeking an Associate Data Scientist to support AI/ML engineering efforts by preparing, validating, and structuring data for LLM-driven systems. This is a hands-on role focused on real-world data processing, pipeline support, and model evaluation.
KEY RESPONSIBILITIES
- Process and clean structured and unstructured data for AI/ML pipelines.
- Prepare training-ready datasets for LLM fine-tuning and evaluation workflows.
- Support RAG and NL→SQL systems through data preparation and validation.
- Perform data quality checks and ensure completeness and consistency.
- Assist in building and maintaining data pipelines and APIs (e.g., FastAPI).
- Collaborate with engineering teams to troubleshoot and optimize data workflows.
REQUIRED SKILLS
- 2+ years of experience in data processing or data-focused roles.
- Strong Python skills with experience in data libraries (Pandas, NumPy, Scikit-learn).
- Experience supporting LLM workflows (fine-tuning, prompt engineering, evaluation).
- Familiarity with structured (SQL) and unstructured text data.
- Understanding of data preparation for AI/ML systems.
NICE TO HAVE
- Exposure to RAG pipelines, embeddings, or evaluation metrics.
- Experience with ML frameworks (PyTorch/TensorFlow) and Docker-based workflows.
- Experience with CI/CD pipelines for ML systems.
- Familiarity with vector databases (e.g., Chroma) and reranking techniques.
- Research exposure to Transformer-based architectures.
Note: This position is open to candidates residing in India only.
Ready to apply?
This job is active. Apply now to get in early.