A
Aleph Alphavia Ashby
Senior AI Researcher- Pre-training (f/m/d)
HeidelbergPosted 2w ago
ResearchSeniorFull-time#ai-lab
Not sure if you're a good fit?
Upload your resume and TixelJobs AI will compare it against Senior AI Researcher- Pre-training (f/m/d) at Aleph Alpha. Get a match score, missing keywords, and improvement tips before you apply.
Free preview · Your resume stays private
About the Role
OUR MISSION
Aleph Alpha is one of the few companies in Europe doing serious foundation model pre-training. Our customers — in finance, manufacturing, and public administration — need models that understand German, meet European regulatory requirements, and work reliably in high-stakes settings. We’re building that in Heidelberg.
We are hiring a Senior AI Researcher to join our Pre-training team and to advance the architecture and training of our next generation of foundation models. If you are excited about designing inference-efficient architectures, optimising training recipes that scale reliably, and training models on a large scale cluster (thousands of NVIDIA Blackwell GPUs), we would love to hear from you.
TEAM CULTURE
We foster a culture built on ownership, autonomy, and empowerment. Teams and individual contributors are trusted to take responsibility for their work and drive meaningful impact. We maintain a flat organisational structure with efficient, supportive management that enables quick decision-making, open communication, and a strong sense of shared purpose. We collaborate closely on complex technical problems, working in pairs or using mob programming to resolve challenging issues.
ABOUT THE ROLE
As a Senior AI Researcher in Pre-training (f/m/d), you will own the critical technical levers that determine the success of our next-generation models: architecture, optimization, stability, and scaling.
Working at the high-leverage intersection of research and engineering, you will translate mathematical reasoning and empirical observations into principled training decisions - from small-scale proxy experiments to multi-thousand-GPU runs.
We are looking for an expert who can combine rigorous experimental design with high-quality production code, directly influencing model quality, run reliability, and the efficiency of the models we ship.
YOUR RESPONSIBILITIES
- Recipe & Architecture Optimization: Own core elements of the training recipe (optimizers, schedules, initialization) and design PyTorch-based architectural improvements to maximize convergence, stability, and training efficiency.
- Scaling Strategy & Predictability: Develop hyperparameter scaling laws and scale-up methodologies, using small-scale proxy experiments to reliably predict multi-thousand-GPU behavior and de-risk major training decisions.
- Stability, Diagnostics & Debugging: Investigate complex convergence issues (loss spikes, divergence) and resolve hard-to-reproduce distributed system failures like communication bottlenecks, race conditions, and synchronization errors.
- System-Model Co-Design: Partner with Compute Performance, Data, Evaluation, and Post-Training teams to align the model lifecycle with hardware constraints, memory bandwidth, and communication topologies.
CORE QUALIFICATIONS
- You are proficient in Python and deeply familiar with PyTorch-based training workflows.
- You have a strong track record in machine learning research and software engineering, demonstrated through shipped models, impactful open-source contributions, or published research.
- You have a strong mathematical foundation and are comfortable reasoning formally about optimisation, scaling behaviour, and training dynamics.
- You deeply understand transformer training dynamics, optimisation, and the behaviour of large distributed training jobs.
- You can design rigorous experiments, reason clearly from noisy results, and translate empirical observations into robust training decisions.
- Hands-on experience pre-training large models (e.g., 7B+ parameters) on substantial infrastructure (e.g., 100+ GPU clusters).
- You apply strong software engineering practices, including writing maintainable, well-tested code and supporting reproducible experimentation workflows.
- You are able to implement complex model architectures efficiently and reliably and to debug complex issues across model code, training dynamics, and distributed systems.
- You collaborate effectively within a research and engineering team and communicate clearly about your work across Pre-training and the broader AAR/AA organization.
- You are able to work in Germany and collaborate regularly on site in Heidelberg as part of the Pre-training team.
PREFERRED QUALIFICATIONS
(We encourage you to apply even if you don't check every box!)
- Large-Scale Training: Hands-on experience training LLMs or multimodal models on large GPU clusters using distributed frameworks (e.g., Megatron-LM, DeepSpeed, torchtitan).
- Predictive Scaling: Familiarity with scaling laws, hyperparameter transfer, or methods for predicting large-scale training behavior from smaller proxy runs.
- Stability & Performance: Experience profiling distributed jobs and diagnosing training anomalies like loss spikes, numerical instability, or optimizer pathologies.
- Advanced Architectures: Exposure to sparse training approaches (e.g., Mixture-of-Experts) and an understanding of their routing and systems trade-offs.
- Track Record of Impact: Demonstrated research excellence through top-tier publications (NeurIPS, ICML, ICLR), impactful open-source contributions, or significant shipped technical work.
- Systems Curiosity: Low-level kernel optimization is not required, but we highly value a strong curiosity about the hardware and systems constraints that shape scale.
What we offer
- Become part of an AI revolution!
- 30 days of paid vacation
- Access to a variety of fitness & wellness offerings via Wellhub https://wellhub.com/de-de/
- Mental health support through nilo.health http://nilo.health
- Substantially subsidized company pension plan for your future security
- Subsidized Germany-wide transportation ticket
- Budget for additional technical equipment
- Flexible working hours for better work-life balance and hybrid working model
- Virtual Stock Option Plan
- JobRad® https://www.jobrad.org/ Bike Lease
Aleph Alpha is one of the few companies in Europe doing serious foundation model pre-training. Our customers — in finance, manufacturing, and public administration — need models that understand German, meet European regulatory requirements, and work reliably in high-stakes settings. We’re building that in Heidelberg.
We are hiring a Senior AI Researcher to join our Pre-training team and to advance the architecture and training of our next generation of foundation models. If you are excited about designing inference-efficient architectures, optimising training recipes that scale reliably, and training models on a large scale cluster (thousands of NVIDIA Blackwell GPUs), we would love to hear from you.
TEAM CULTURE
We foster a culture built on ownership, autonomy, and empowerment. Teams and individual contributors are trusted to take responsibility for their work and drive meaningful impact. We maintain a flat organisational structure with efficient, supportive management that enables quick decision-making, open communication, and a strong sense of shared purpose. We collaborate closely on complex technical problems, working in pairs or using mob programming to resolve challenging issues.
ABOUT THE ROLE
As a Senior AI Researcher in Pre-training (f/m/d), you will own the critical technical levers that determine the success of our next-generation models: architecture, optimization, stability, and scaling.
Working at the high-leverage intersection of research and engineering, you will translate mathematical reasoning and empirical observations into principled training decisions - from small-scale proxy experiments to multi-thousand-GPU runs.
We are looking for an expert who can combine rigorous experimental design with high-quality production code, directly influencing model quality, run reliability, and the efficiency of the models we ship.
YOUR RESPONSIBILITIES
- Recipe & Architecture Optimization: Own core elements of the training recipe (optimizers, schedules, initialization) and design PyTorch-based architectural improvements to maximize convergence, stability, and training efficiency.
- Scaling Strategy & Predictability: Develop hyperparameter scaling laws and scale-up methodologies, using small-scale proxy experiments to reliably predict multi-thousand-GPU behavior and de-risk major training decisions.
- Stability, Diagnostics & Debugging: Investigate complex convergence issues (loss spikes, divergence) and resolve hard-to-reproduce distributed system failures like communication bottlenecks, race conditions, and synchronization errors.
- System-Model Co-Design: Partner with Compute Performance, Data, Evaluation, and Post-Training teams to align the model lifecycle with hardware constraints, memory bandwidth, and communication topologies.
CORE QUALIFICATIONS
- You are proficient in Python and deeply familiar with PyTorch-based training workflows.
- You have a strong track record in machine learning research and software engineering, demonstrated through shipped models, impactful open-source contributions, or published research.
- You have a strong mathematical foundation and are comfortable reasoning formally about optimisation, scaling behaviour, and training dynamics.
- You deeply understand transformer training dynamics, optimisation, and the behaviour of large distributed training jobs.
- You can design rigorous experiments, reason clearly from noisy results, and translate empirical observations into robust training decisions.
- Hands-on experience pre-training large models (e.g., 7B+ parameters) on substantial infrastructure (e.g., 100+ GPU clusters).
- You apply strong software engineering practices, including writing maintainable, well-tested code and supporting reproducible experimentation workflows.
- You are able to implement complex model architectures efficiently and reliably and to debug complex issues across model code, training dynamics, and distributed systems.
- You collaborate effectively within a research and engineering team and communicate clearly about your work across Pre-training and the broader AAR/AA organization.
- You are able to work in Germany and collaborate regularly on site in Heidelberg as part of the Pre-training team.
PREFERRED QUALIFICATIONS
(We encourage you to apply even if you don't check every box!)
- Large-Scale Training: Hands-on experience training LLMs or multimodal models on large GPU clusters using distributed frameworks (e.g., Megatron-LM, DeepSpeed, torchtitan).
- Predictive Scaling: Familiarity with scaling laws, hyperparameter transfer, or methods for predicting large-scale training behavior from smaller proxy runs.
- Stability & Performance: Experience profiling distributed jobs and diagnosing training anomalies like loss spikes, numerical instability, or optimizer pathologies.
- Advanced Architectures: Exposure to sparse training approaches (e.g., Mixture-of-Experts) and an understanding of their routing and systems trade-offs.
- Track Record of Impact: Demonstrated research excellence through top-tier publications (NeurIPS, ICML, ICLR), impactful open-source contributions, or significant shipped technical work.
- Systems Curiosity: Low-level kernel optimization is not required, but we highly value a strong curiosity about the hardware and systems constraints that shape scale.
What we offer
- Become part of an AI revolution!
- 30 days of paid vacation
- Access to a variety of fitness & wellness offerings via Wellhub https://wellhub.com/de-de/
- Mental health support through nilo.health http://nilo.health
- Substantially subsidized company pension plan for your future security
- Subsidized Germany-wide transportation ticket
- Budget for additional technical equipment
- Flexible working hours for better work-life balance and hybrid working model
- Virtual Stock Option Plan
- JobRad® https://www.jobrad.org/ Bike Lease
Ready to apply?
This job is active. Apply now to get in early.