Skip to content
world

Google's Gemini Can Now Turn Anything Into Anything — And It's Kind of Wild

Google's latest Gemini AI model can transform nearly any type of media into any other — text, images, audio, and video all flowing between each other seamlessly. The technology is impressively capable and surprisingly accessible, raising fresh questions about the line between creative fun and AI-generated slop.

·ottown·3 min read
Google's Gemini Can Now Turn Anything Into Anything — And It's Kind of Wild
143

Google Just Made AI Even More Powerful — and More Unsettling

Google's latest Gemini model can take almost anything you throw at it — a photo, a voice clip, a block of text — and transform it into something else entirely. Text to video. Image to audio. Video to text. The new "anything-to-anything" capability is being called one of the most fluid multimodal AI systems ever released to the public, and hands-on testing suggests the hype is largely warranted.

The Verge put the model through its paces this week, using it to generate realistic videos of a stuffed animal toy appearing to go on vacation — a callback to a Gemini ad Google ran last year. What they found was striking: the tools are genuinely good, the results are surprisingly convincing, and the effort required to produce them is minimal.

What "Anything-to-Anything" Actually Means

Most AI models have a lane. Some generate images. Some transcribe audio. Some write text. What makes the new Gemini approach different is that it's designed to move fluidly between all of these modalities in a single session.

You could, in theory, describe a scene in text, have the model generate an image, then animate that image into a short video, then extract audio narration from the video — all without switching tools or re-uploading files. It's a level of creative pipeline compression that would have required a team of specialists just a few years ago.

The Slop Problem Isn't Going Away

All of this capability comes bundled with a thorny question the tech industry still hasn't answered cleanly: at what point does AI-generated content cross from harmless creative play into something more corrosive?

The Verge's experiment — deepfaking a stuffed animal for a personal project, never shown to a child — is a benign example. But the same tools can produce convincing fake footage of real people, fabricated news events, or synthetic media designed to mislead. The gap between "this is fun" and "this is harmful" is narrowing as the tools get better and easier to use.

Google has said it builds safeguards into Gemini to prevent obvious misuse, but independent researchers have consistently found ways around such guardrails. The company, like every major AI lab, is racing ahead of the regulatory and ethical frameworks that would govern the technology.

Why This Moment Matters

For everyday users, the headline is that generative AI has become dramatically more accessible. You don't need technical skills, expensive software, or industry connections to produce media that looks professionally made. That democratization has genuine upside — for artists, educators, small businesses, and curious people.

But the same accessibility that lets a parent create a whimsical vacation video for a stuffed animal also lowers the barrier for bad actors. As the tools improve, the public — and policymakers — will need to develop much sharper instincts for evaluating what's real.

Gemini's new model is a milestone in what AI can do. Whether it's a milestone in what AI should do is a harder question, and one the industry has shown little urgency in answering.

Source: The Verge

Stay in the know, Ottawa

Get the best local news, new restaurant openings, events, and hidden gems delivered to your inbox every week.