Over the Next One to Three Years, Multimodal GenAI Models Will Increasingly Enrich More Applications.
“The shift to multimodal enterprise software is a fundamental transformation in business operations and innovation,” said Roberta Cozza, Sr. Director Analyst at Gartner. “Multimodal generative AI (GenAI) will revolutionize enterprise applications by adding previously unattainable features and functionalities, impacting sectors like healthcare, finance, and manufacturing. By enhancing domain-specific language models, it will improve accuracy, automate operations, and drive contextual decision intelligence, enabling AI to take proactive actions across tasks.”
High-impact technologies such as multimodal GenAI models are at the center of Gartner’s Emerging Tech Impact Radar for GenAI (see Figure 1). Product leaders will have to make critical decisions on investing in these emerging GenAI technologies to enable customers to reach new heights of value in their business.
Figure 1: Emerging Tech Impact Radar: Generative AI
Source: Gartner (July 2025)
Today, many multimodal models offer processing across two or three modalities (e.g., text-to-video or speech-to-image). This will increase over the next few years to include more diverse and new modalities.
“Enterprises should focus on integrating multimodal capabilities into their software to enhance user experiences and operational efficiency. By leveraging the diverse data inputs and outputs that multimodal GenAI offers, businesses can unlock new levels of productivity and innovation,” said Cozza.