TixelJobs

Multimodal AI Jobs (2026)

Multimodal AI combines multiple data types — text, images, video, audio — into unified systems that understand and generate across modalities. With models like GPT-4o, Gemini, and Claude becoming natively multimodal, demand for engineers who can build cross-modal AI systems is surging.

Last updated: June 27, 2026

180
Open positions
$170K+
Avg salary
13
Companies hiring

Latest Multimodal AI Jobs

View all jobs
M
MYL Instruments3mo ago

Machine Learning Engineer – Computer Vision & Multimodal AI

Montreal, Quebec, Canada
#computer-vision
N
Nextiva4mo ago

AI Software Engineer (Multimodal AI Agents)

Microsoft
Microsoft3mo ago

Applied Scientist - Multimodal Foundation Models & Robotics

Zurich, Zurich, Switzerland
X
XairatherapeuticsNew1d ago

AI Scientist - Biomedical Multimodal Modeling

South San Francisco, California, United States$170K - $240K/yrFull-time
K
Krafton2d ago

[AI Research Div.] Research Engineer - Vision Language Action Game Agent (2년 이상 / 인턴)

SeoulFull-time
W
Waymo4d ago

Senior Machine Learning Engineer, Computer Vision/VLM

Mountain View, CA, USA; San Francisco, CA, USAFull-time
H
Hike Medical4d ago

Senior Machine Learning Engineer, Multimodal AI

San Francisco, CA$170K - $300K/yrFull-time
W
Waymo5d ago

Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

Mountain View, CA, USA; San Francisco, CA, USA;Full-time
A
Agency1w ago

Portuguese Language Data Contributor (Multimodal) – Freelance AI Trainer Project

REMOTEFull-time
G
GovTech Singapore1w ago

Data Scientist, Multimodal AI (AI Practice)

SingaporeFull-time
A
Agency1w ago

English Language Data Contributor (Multimodal) – Freelance AI Trainer Project

REMOTEFull-time
A
Agency1w ago

Spanish (Latin America) Language Data Contributor (Multimodal) – Freelance AI Trainer Project

REMOTEFull-time
A
Agency1w ago

Japanese Language Data Contributor (Multimodal) – Freelance AI Trainer Project

REMOTEFull-time
A
Agency1w ago

Korean Language Data Contributor (Multimodal) – Freelance AI Trainer Project

REMOTEFull-time
S
Swordhealth1w ago

AI Research Scientist (Multimodal post-training)

EuropeFull-time
F
Flawless1w ago

Senior Applied Scientist - Multimodal

LondonFull-time
U
Unity3D1w ago

Staff Machine Learning Engineer - Computer Vision & Multi-Modal AI

San Francisco, CA, USAFull-time
N
Natera1w ago

Machine Learning Scientist, Multimodal AI

REMOTEFull-time
#remote
W
Waymo1w ago

Staff Machine Learning Engineer – VLM/LLM Evaluation

Mountain View, CA, USA; San Francisco, CA, USA; Kirkland, WA, USA; New York City, NY, USAFull-time
W
Waymo1w ago

Senior Machine Learning Engineer, Perception LLM/VLM

Mountain View, CA USA; San Francisco, CA USA;Full-time

Frequently Asked Questions

What are multimodal AI systems?

Multimodal AI systems process and generate content across multiple modalities: text, images, video, audio, and code. Examples include vision-language models (GPT-4o, Gemini, Claude), text-to-image systems (DALL-E, Stable Diffusion), and audio-text models (Whisper). As AI engineer roles have surged 143% year-over-year, multimodal specialists are particularly sought after because building cross-modal systems requires a rare combination of CV and NLP expertise.

What skills do multimodal AI roles require?

Strong foundations in deep learning, transformer architectures, computer vision, and NLP are essential. Experience with cross-modal training, contrastive learning (CLIP), diffusion models, and multimodal evaluation is critical. PyTorch dominates as the primary framework, and Python appears in 47-58% of all AI listings. Cloud platform experience (AWS, GCP, Azure) is necessary for large-scale distributed training. Senior NLP roles at top tech companies can command up to $400K in total compensation.

What is the salary for multimodal AI engineers?

Multimodal AI engineers earn premium salaries due to the breadth of skills required. Computer vision specialists average around $169K, while NLP senior roles at top tech companies reach up to $400K. AI Engineers in this space average $140K-$185K base with total comp around $211K. US roles lead globally at $147K-$176K average, while Western Europe ranges from $72K-$160K. Workers with these specialized AI skills earn approximately 25% more than peers without AI expertise.

AI Job Insights for Multimodal AI Jobs

Salary Range (Yearly, USD)

$140K - $616K

Median $193K from 22 listings with salary data

Top Companies Hiring

Agency (5)Waymo (4)MYL Instruments (1)Nextiva (1)Microsoft (1)Xairatherapeutics (1)

Based on recent listings shown on this page.

Common Roles

Machine Learning Engineer – Computer Vision & Multimodal AI (1)AI Software Engineer (Multimodal AI Agents) (1)Applied Scientist - Multimodal Foundation Models & Robotics (1)AI Scientist - Biomedical Multimodal Modeling (1)[AI Research Div.] Research Engineer - Vision Language Action Game Agent (2년 이상 / 인턴) (1)Senior Machine Learning Engineer, Computer Vision/VLM (1)

Counts reflect recent listings, not total market size.

In-Demand Skills

Computer Vision (1)Remote (1)

Derived from tags on recent listings.