Google’s new AI tool uses image prompts instead of text

By CNN Newsource

Published December 17, 2024 1:47 PM

By John Towfighi, CNN

(CNN) — Google’s newest artificial intelligence tool, “Whisk,” lets people upload photos to get back a combined, AI-generated image – even without users inputting any text to explain what they want.

Users can input images depicting subjects, setting and style before Whisk combines everything into one image.

Whisk is a “creative tool” for quick inspiration, Google said in a blog post, as opposed to a “traditional image editor.” In essence, Whisk is intended as a fun AI feature, rather than as something that’s supposed to be refined professional work.

Big Tech companies like Google and OpenAI are racing to release consumer products that can showcase uses for the snazzy new technology, even as naysayers warn that the lack of guardrails around the development of AI poses dangers for humanity.

Since OpenAI initially launched its text-to-image creation tool, Dall-E, in 2021, the concept of AI-generated artwork has swamped social media and become a focus of consumer products. Google’s Whisk is an image-to-image generator, building upon the popular concept of text-to-image generators.

People using Whisk can “remix” the final image by editing their inputs and mixing the categories to produce different images like a plushie toy, enamel pin or sticker. Users can add in text if they want to direct certain details, but it is not required to create an image.

“Whisk is designed to allow users to remix a subject, scene and style in new and creative ways, offering rapid visual exploration instead of pixel-perfect edits,” Thomas Iljic, a director of product management at Google Labs, said in a statement.

Google’s Whisk is built upon the generative AI developed by DeepMind, the AI lab that Google acquired in 2014.

Whisk works by using Google’s core AI offering, Gemini, which debuted in December 2023, and pairing it with Imagen 3, the latest text-to-image generator released by DeepMind in December.

When users upload their images, Gemini generates a caption which is fed into Imagen 3. The process captures the “essence” of the subject as opposed to an exact replica, which allows for remixing the final image but also means the end product might stray from the prompt.

For example, the generated image might have a different height, hairstyle or skin tone as the prompt images, Google said in a blog post.

When Google first rolled out Gemini’s text-to-image creator in February, the company faced initial backlash because the tool produced historically inaccurate images.

Whisk is first available as a website on Google Labs for users in the US and is in its early stages of development, the company said.

OpenAI also recently released a text-to-video generator called Sora, highlighting the competition for consumer products.

Dan Ives, managing director and senior equity analyst at Wedbush Securities, told CNN that Whisk is another “flex the muscles moment” for Google in the AI and tech race.

“DeepMind is a key asset for Google,” Ives said, noting that AI products are a part of Google’s “treasure chest” of new products for 2025, which also include a new Android operating system built in collaboration with Samsung and Qualcomm.

Article Topic Follows: CNN - Money

Jump to comments ↓