TixelJobs

Multimodal AI Jobs (2026)

Multimodal AI combines multiple data types — text, images, video, audio — into unified systems that understand and generate across modalities. With models like GPT-4o, Gemini, and Claude becoming natively multimodal, demand for engineers who can build cross-modal AI systems is surging.

Last updated: May 13, 2026

142
Open positions
$159K+
Avg salary
13
Companies hiring

Latest Multimodal AI Jobs

View all jobs
M
MYL Instruments2mo ago

Machine Learning Engineer – Computer Vision & Multimodal AI

Montreal, Quebec, Canada
#computer-vision
N
Nextiva2mo ago

AI Software Engineer (Multimodal AI Agents)

Microsoft
Microsoft2mo ago

Applied Scientist - Multimodal Foundation Models & Robotics

Zurich, Zurich, Switzerland
G
GovTech Singapore5d ago

Data Scientist, Multimodal AI (AI Practice)

SingaporeFull-time
X
Xairatherapeutics6d ago

AI Scientist - Biomedical Multimodal Modeling

South San Francisco, California, United States$170K - $240K/yrFull-time
DeepMind
DeepMind6d ago

Research Engineer, Multimodal Reasoning For Information Literacy

Mountain View, California, USFull-time
#ai-lab
N
Niantic Spatial6d ago

Computer Vision Researcher (VLM)

LondonFull-time
K
Kodiak Robotics1w ago

Senior Applied AI Engineer - Multimodal Transformers

San Francisco Bay AreaFull-time
I
Iambic Therapeutics1w ago

Machine Learning Scientist — Large multimodal models

Boston Office$148K - $210K/yrFull-time
R
Roblox1w ago

[2026] Senior Machine Learning Engineer, Multimodal AI, Computer Vision and Graphics - PhD Early Career

San Mateo, CA, United StatesFull-time
W
Waymo1w ago

Senior Machine Learning Engineer – VLM/LLM Evaluation

Mountain View, CA | San Francisco, CA | Kirkland, WA | New York City, NYFull-time
W
Waymo1w ago

Senior Research Scientist, Foundation Model (LLM/VLM)

Mountain View, CA, USA; San Francisco, CA, USAFull-time
W
Waymo1w ago

Senior Machine Learning Engineer, Computer Vision/VLM

Mountain View, CA, USA; San Francisco, CA, USAFull-time
W
Waymo1w ago

Senior ML Engineer, LLM / VLM Distillation

Mountain View, California, United States, Mountain View, California, United StatesFull-time
W
Waymo1w ago

Senior Machine Learning Engineer, Perception LLM/VLM

Mountain View, CA USA; San Francisco, CA USA;Full-time
W
Waymo1w ago

Senior Machine Learning Engineer, Multimodal Perception (LLM/VLM)

Mountain View, CA, USAFull-time
W
Waymo1w ago

Applied Research Scientist, Perception LLM/VLM (PhD, New Grad)

Mountain View, CA USA; San Francisco, CA USA;Full-time
W
Waymo1w ago

Staff Machine Learning Engineer – VLM/LLM Evaluation

Mountain View, CA | San Francisco, CA | Kirkland, WA | New York City, NYFull-time
Pinterest
Pinterest1w ago

PhD Fall Machine Learning Intern (ATG — Visual, Multimodal, and Recommender Systems)

San Francisco, CA, US; Palo Alto, CA, US; Seattle, WA, US; New York, NY, USFull-time
L
Lalamove1w ago

Senior AI Engineer (OCR/VLM focus)

Hong Kong SARFull-time

Frequently Asked Questions

What are multimodal AI systems?

Multimodal AI systems process and generate content across multiple modalities: text, images, video, audio, and code. Examples include vision-language models (GPT-4o, Gemini, Claude), text-to-image systems (DALL-E, Stable Diffusion), and audio-text models (Whisper). As AI engineer roles have surged 143% year-over-year, multimodal specialists are particularly sought after because building cross-modal systems requires a rare combination of CV and NLP expertise.

What skills do multimodal AI roles require?

Strong foundations in deep learning, transformer architectures, computer vision, and NLP are essential. Experience with cross-modal training, contrastive learning (CLIP), diffusion models, and multimodal evaluation is critical. PyTorch dominates as the primary framework, and Python appears in 47-58% of all AI listings. Cloud platform experience (AWS, GCP, Azure) is necessary for large-scale distributed training. Senior NLP roles at top tech companies can command up to $400K in total compensation.

What is the salary for multimodal AI engineers?

Multimodal AI engineers earn premium salaries due to the breadth of skills required. Computer vision specialists average around $169K, while NLP senior roles at top tech companies reach up to $400K. AI Engineers in this space average $140K-$185K base with total comp around $211K. US roles lead globally at $147K-$176K average, while Western Europe ranges from $72K-$160K. Workers with these specialized AI skills earn approximately 25% more than peers without AI expertise.

AI Job Insights for Multimodal AI Jobs

Salary Range (Yearly, USD)

$140K - $616K

Median $191K from 20 listings with salary data

Top Companies Hiring

Waymo (8)MYL Instruments (1)Nextiva (1)Microsoft (1)GovTech Singapore (1)Xairatherapeutics (1)

Based on recent listings shown on this page.

Common Roles

Machine Learning Engineer – Computer Vision & Multimodal AI (1)AI Software Engineer (Multimodal AI Agents) (1)Applied Scientist - Multimodal Foundation Models & Robotics (1)Data Scientist, Multimodal AI (AI Practice) (1)AI Scientist - Biomedical Multimodal Modeling (1)Research Engineer, Multimodal Reasoning For Information Literacy (1)

Counts reflect recent listings, not total market size.

In-Demand Skills

Computer Vision (1)Ai Lab (1)

Derived from tags on recent listings.

Multimodal AI Jobs | TixelJobs — Jobs at AI Companies