Category: large-multimodal-models
-
How to Apply Powerful AI Audio Models to Real-World Applications
How to Apply Powerful AI Audio Models to Real-World Applications Learn about different types of AI audio models and the application areas they can be used in. The post How to Apply Powerful AI Audio Models to Real-World Applications appeared first on Towards Data Science. Eivind Kjosbakken Go to original source
-
Unlocking Multimodal Video Transcription with Gemini
Unlocking Multimodal Video Transcription with Gemini Explore how to transcribe videos with speaker identification in a single prompt The post Unlocking Multimodal Video Transcription with Gemini appeared first on Towards Data Science. Laurent Picard Go to original source
-
Multimodal Search Engine Agents Powered by BLIP-2 and Gemini
Multimodal Search Engine Agents Powered by BLIP-2 and Gemini This post was co-authored with Rafael Guedes. Introduction Traditional models can only process a single type of data, such as text, images, or tabular data. Multimodality is a trending concept in the AI research community, referring to a model’s ability to learn from multiple types of…