Learn Prompting's Newsletter
Posts
Learn Prompting #10: Multimodal AI Showdown: Big Players, Limited Seats, and a Dash of FOMO

Learn Prompting #10: Multimodal AI Showdown: Big Players, Limited Seats, and a Dash of FOMO

From Google’s Veo to OpenAI’s Sora, the multimodal AI space is heating up. Plus, explore the future of AI and learn GenAI with 15 hands-on courses!

Learn Prompting
December 11, 2024

Hey all, welcome to this week’s newsletter!

Today, we’ll cover:

The Multimodal AI Boom
Video Models: The Hot Topic This Week
Image Models
Other GenAI News
2025 AI Predictions

Let’s dive in!

The Multimodal AI Boom

Remember when ChatGPT stunned the world by making large language models (LLMs) accessible to almost anyone? Back in 2023, LLMs were the tech world’s hottest topic, dominating headlines and conversations.

But we’re almost in 2025 now. And the predictions about 2024 being the year of multimodal AI have proven spot on. Especially for models that combine vision and language capabilities.

Multimodal AI combines different data types, like text and visuals, into one cohesive model.

As 2024 wraps up, the multimodal AI boom is in full swing, with major generative AI players rolling out their latest creations. Here’s the breakdown:

Video Models: The Hot Topic This Week

This week, both Google and OpenAI launched video generation models, almost as if they coordinated their releases. But hold your excitement—access is still highly restricted.

Veo is impressive! Here’s an example of a frame from a video generated by Google Veo. Prompt: Timelapse of a common sunflower opening, dark background

Google introduced Veo, its first AI model for video creation, available via Google’s Vertex AI platform. Want to try it? You’ll need to join the Trusted Tester Waitlist (must be 18+ and in the U.S.).

OpenAI’s Sora video generation model has also launched—well, sort of. It’s currently available in selected regions, and some features, like generating videos with real human-like characters, are limited to select individual users. For instance, users in the EU can’t access it yet.

A rising star, Luma AI, known for their Dream Machine app for generating visuals, unveiled Ray 2, a next-gen video model designed for creative video generation on AWS. The Dream Machine app is now live on the web and mobile.

Here’s an example of an image generated using the Dream Machine app. Prompt: an astronaut flying in space. Dream Machine app can also generate video right from this image!

Image Models

On the image generation front, huge updates are shaking up the scene:

Grok is updated with image generation capabilities, a model code-named Aurora! This could be a real competitor to existing image-generation models. Another update is that Grok is now available to everyone, even free users can now send 10 messages every 2 hours.

Here’s how Grok sees Learn Prompting on X. Prompt: Draw me

Google unveiled Genie 2, their large-scale 3D foundation model. Following up on Fei-Fei Li's World Labs, this model lets users create expansive 3D environments.

Amazon introduced a new family of multimodal models called Nova, featuring:

Four text-generation models: Micro, Lite, Pro, and Premier (Premier debuts in early 2025).
An image-generation model, Nova Canvas, and a video-generation model, Nova Reel, both launched on AWS this week.

Other GenAI News

Some notable updates from other players in the AI race:

Microsoft Copilot Vision
Microsoft’s Copilot can now browse the web with you. Dubbed Copilot Vision, this feature is rolling out to a limited number of Pro subscribers in the U.S., allowing users to browse alongside the AI in Edge. Looks like a direct follow-up to Claude’s computer use, although Claude is supposed to work with any app on your computer.
Reddit Answers
Reddit is testing AI-powered Reddit Answers, a feature designed to provide quick responses based on platform posts. It’s accessible via a new button on Reddit’s homepage, leading to a dedicated Q&A page. Initially, it’s available to a limited U.S. audience.
OpenAI’s Canva for All
OpenAI has rolled out Canvas to all users, including free accounts. Canvas is a new interface for ChatGPT that lets you collaborate on writing and coding projects. It opens in a separate window, offering a more immersive way to work with ChatGPT.

canvas is now available to all chatgpt users, and can execute code!
more importantly it can also still emojify your writing.
— Sam Altman (@sama)
6:48 PM • Dec 10, 2024

As the multimodal AI landscape heats up, we’re seeing some serious competition among the big players.

Which model are you most excited about trying? Let us know—hit reply and share your thoughts! Or write in the comments below 🧑‍🚀

Explore Generative AI with Our Courses

Our collection of 15 courses

Multimodal AI is evolving fast, with models blending vision, language, and so much more. But why just watch it happen when you can be part of the action?

We’ve put together 15 specialized courses under one subscription to help you get hands-on with the tools and techniques driving Generative AI. Whether you’re a beginner or looking to level up, there’s something here for you.

Jump in today with a 3-day free trial!

Curious about 2025? 10 Predictions from the State of AI Report 2024

Executive Summary slide from State of AI Report 2024

State of AI Report 2024 was recently released as a 200+ slides-long presentation. We summarized 10 predictions for 2025 for you:

$10B+ Sovereign Investment triggers national security review.
No-code App Success goes viral in the App Store.
Data Collection Reforms after legal trials.
Softer EU AI Act implementation due to overregulation concerns.
Open-source Model Surpasses OpenAI o1 in reasoning benchmarks.
NVIDIA’s Dominance Continues with no significant market challenges.
Humanoid Investment Declines due to product-market fit struggles.
Apple’s On-device AI drives personal AI assistant momentum.
AI-generated Research Paper accepted at a major ML conference.
AI-driven Video Game achieves mainstream success.

We also collected the key takeaways in one article. Take a look.

Thanks for reading! We’d love your feedback to make this newsletter even better. Our goal is to share content that’s valuable and relevant to you. Help us fine-tune future editions by sharing your thoughts on this week’s email—it’ll only take a moment!

How was this week's email?

Reply

or to participate.