Nano Banana 2: Google's latest AI image generation model

Some random predictions about what AI image generation tools will do/are doing to art:

1. The narrative/life of the artist becomes a lot more important. The most successful artists are ones that craft a story around their life and art, and don't just create stuff and stop. This will become even more important.

2. Originality matters more than ever. By design, these tools can only copy and mix things that already exist. But they aren't alive, they don't live in the world and have experiences, and they can't create something truly new.

3. Those that bother to learn the actual art skills, and not merely prompting, will increasingly be miles ahead of everyone else. People are lazy, and bothering to put in the time to actually learn stuff will stand out more and more. (Ditto for writing essays and other writing people are doing with AI.)

4. Taste continues to be the single most important thing. The vast, vast majority of AI art out there is...not very good. It's not going to get better, because the lack of taste isn't a technical problem.

5. Art with physical materials will become increasingly popular. That is, stuff that can't be digitized very well: sculpture, installation art, etc. Above all, AI art is uncool, which means it has no real future as a leading art form. This uncoolness will push people away from the screen and towards things that are more material.

Keeping track of the different AI product names is so confusing even from a single company.

Why can't Google, for example just call:

  Gemini Image = Nano Banana
  Gemini Video = Veo
  ...

I'm building my personal home right now. The AI image models have been a game-changer in designing the look of the house. My architect did an OK job, but the details that Nano Banana added really bring the house up a notch. I just do hundreds of renders from the basic 3D models and I find looks that I like and iterate from there. We are implementing the renders from Nano Banana over our Interior Designers designs. We would not have hired the Interior Designers again after using Nano Banana to do our interiors.

I think part of the issue with architects and designers today is that they use CAD too much. It's easy to design boxes and basic roof lines in CAD. It's harder to put in curves and more craftsman features. Nano Banana's renders have more organic design features IMO.

Our house is looking great and we're very happy how it's going so far with a lot of the thanks to Nano Banana.

These image gen models are getting so advanced and life like that increasingly the general public are being duped into believing AI images are actually real (ex Facebook food images or fake OF models). Don't get me wrong I will enjoy the benefits of using this model for expressing myself better than ever before, but can't help feeling there's something also very insidious about these models too.

I'm sure this has been written about but here's what happens long term - images are commoditized and lose their emotional appeal.

Probably about half of us here remember photos before the cell phone era. They were rare, and special, and you'd have a few photos per YEAR to look back on. The feel of photos back then, was at least 100x stronger than now. They were a special item, could be given as a gift. But once they became freely available that same amount of emotion is now split across many thousands of photos. (not saying this is good or bad, just increased supply reducing value of each item)

With image/art generation the same thing will happen and I can already feel it happening. Things that used to be beautiful or fantastic looking now just feel flat and AI-ish. If claymation scenes can be generated in 1s, and I see a million claymation diagrams a year, then claymation will lose its charm. If I see a million fake Tom Cruise videos, then it oversaturates my desire for desire for all Tom Cruise movies.

What a time to be alive.

What they've chosen as examples to illustrate the strength of the new model surprises me.

The "cubism" example seems like it would be a closer fit to something like stained glass or something. I don't think the thing really understands what cubism was all about. Cubist painters were trying to free themselves from the confines of a single integral plane of perspective by allowing themselves to show various parts of the image from different viewpoints, different times, different styles, etc.

The division of the image into geometric shapes is just a by-product of that quest, whereas the examples here have made it the sum total of the whole piece.

This feels to me like an example of how LLMs still don't "understand" what the art means, and are just aping its facade.

What a great thing this didn't exist in the past. We likely wouldn't have had any of the amazing artworks that we have now. Imagine an AI generated Mona Lisa, Nightwatch or Sistine Chapel ceiling because prompting would have been so much cheaper than paying Leonardo, Rembrandt or Michelangelo...

Now extrapolate to all other artforms. Sculpture seems safe, for now, but only barely so.

Funny timing. I just migrated my personal styling app off of Nano Banana.

My main use case is editing user uploads to enhance their clothing images. A large part of it is preserving logo, graphics and other technical details. I noticed over time it felt like Nano Banana has gotten worse at this.

I have a test set of graphic t-shirts that I noticed the model seeming getting worse with it. This combined with price and the terrible experience of their cloud console got me to migrate off.

This will stay useless for editing personal pictures so long as virtually every prompt with a person in it is met with "I can't edit images of some people". For whatever reason, they've made the celeb detection so ultra-aggressive that almost everyone is detected as a (lookalike) celeb.

I've only had a brief opportunity to try out NB Pro 2 (`gemini-3.1-flash-image-preview`), so I haven't had a chance to update GenAI Showdown.

Here's some of my captions that tend to trip up even state-of-the-art models.

https://mordenstar.com/other/nb-pro-2-tests

So far it does feel more iterative than an entirely new leap in terms of capabilities, but I haven't run it through the more multimodal aspects such as editing existing images.

That being said, it actually managed the King Louie jump rope test which surprised me.

I did some tests, my education is in digital imaging technology/film from 20 years ago so I find this stuff fun to follow.

Two what I could consider "interesting prompts" for image gen testing. Did pretty well.

https://s.h4x.club/eDuOzPDd

"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens." - Only major problem i could find at a glance is the clasps don't make sense probably, and the drop of water inside the watch on the cog doesn't make sense/cog mangled into tweezers.

https://s.h4x.club/yAuNPlRk

"A candid photograph taken from behind an elderly woman sitting alone on a park bench in late autumn. She is gently resting one hand on the empty seat beside her, where a man's weathered flat cap and a folded newspaper sit untouched. Fallen golden leaves cover the path ahead. The low afternoon sun casts her long shadow alongside a second, fainter shadow that almost seems to be there, the suggestion of someone sitting next to her, visible only in the light on the ground. Muted, warm color palette, shallow depth of field on the background trees, photojournalistic style." - I don't know why but it internal errored twice on this one but then got there.

This looks like a response to Seedream 5.0 lite that was published two days ago.

I use all those fancy image models editing capabilities for my fast fashion web shop. I must say: product photography for clothing and accessories product is dead. Those models are amazing at style transfering and garment transferring.

We will see how good will be Seedream 5.0 full version.

I think this tech is cool, from an engineering perspective. I’m trying to figure out if there’s any justification for using it in a business world outside of: “We don’t want to pay an artist.”

You can argue things like code generation are an extension of the engineer wielding it. Image generation just seems like a net negative overall if it’s used at scale.

Edit: By scale, I mean large corporations putting content in front of millions. I understand the appeal for smaller businesses where they probably weren’t going to pay an artist anyway.

Since talking images, are there any AI models that can output real transparent gifs/pngs?

And not a (botched) fake white/gray grid background that is commonly used to visualize transparency?

Model card: https://deepmind.google/models/model-cards/gemini-3-1-flash-...

Pretty close to Gemini 3 Pro Image (aka Nano Banana Pro) in most benchmarks, even without thinking+search, and even exceeding it in 2 most important ones of 'Overall Preference' and 'Visual Quality'. I'm excited about the big jump in Infographics/Factuality (even without thinking+search; I'm surprised that text+image search grounding doesn't make an even bigger dent).

If any AI image generation companies are reading this, I want the image to be in layers which can also be exported, so I can 1) do post processing of my own or 2) arrange for an AI image generation model to process just the layers i specify.

It's notable that this model is less advanced that the previous "Pro" model, and also that the Gemini interface is defaulting all requests to "Fast" even if you've previously changed it to Pro.

I guess even Google is running out of GPUs.

It still seems to have the same pitfalls as all the other image generation models. I ran it through my test prompt (wary of posting it here, lest it gets trained on) - it still cannot generate something along the lines of "object A, but with feature X from Y", where that combo has never been seen in the training data. I wonder how the "astronaut riding unicorn on the moon" was solved...

EDIT: after significant prompting, it actually solved it. I think it's the first one to do so in my testing.

Kind of surprised it hasn't been pulled yet. Have seen some very disturbing (grok tier) examples of completely bypassing whatever censors they have in place by simply asking gemini to write the prompt

I saw an item for sale on Ali Express's video and I thought "Wow, they hired some really attractive actors to pitch their little gadget." 30 seconds in, I realized they used GenAI. Not because it looked AI, but because the production values looked too high and professional for the item. I would get in on this if you sell anything online.

I'm officially done with the Nano Banana name. It was fun, but can we go back just calling it Gemini Image?

I really really want to see how these images are starting to form into videos. The stills are clearly getting better and better, but what about when you need the stills to organically conform to a keyed script?

How does it compare to Nano Banana Pro?

I have Google AI Ultra. Where can I test this? They say its in aistudio, which says its a paid model and I need to setup billing (as if paying for Ultra isnt enough). They say its available in antigravity, but I cant seem to find it there?

Interesting they get to rev this with the release of a new flash model. I'm speculating part of the distil pipeline includes the image gen stuff; that seems like internal tooling that will pay dividends over time, if true. New frontier model -> automatic new image model. Even if it's just incremental updates, it's good for both the product cadence and compounding improvements.

Google updated it early in AI Studio so I've been experimenting:

- Base pricing for a 1024x1024 image is almost 1.6x what normal Nano Banana is ($0.067 vs. $0.039), however you can now get a 512x512 image for cheaper, or a 4k image for cheaper than four 1k images: https://ai.google.dev/gemini-api/docs/pricing#gemini-3.1-fla...

- Thinking is now configurable between `Minimal` and `High` (was not the case with Nano Banana Pro)

- Safety of the model appears to be increased so typical copyright infringing/NSFW content is difficult to generate (it refused to let me generate cartoon characters having taken psychedelics)

- Generation speed is really slow (2-3min per image) but that may be due to load.

- Prompt adherence to my trickier prompts for Nano Banana Pro (https://minimaxir.com/2025/12/nano-banana-pro/) is much worse, unsurprisingly. For example I asked it to make a 5x2 grid with 10 given inputs and it keeps making 4x3 grids with duplicate inputs.

However, I am skeptical with their marquee feature: image search. Anyone who has used Nano Banana Pro for awhile knows that it will strongly overfit on any input images by copy/pasting the subject without changes which is bad for creativity, and I suspect this implementation appears the same.

Additionally I have a test prompt which exploits the January 2025 knowledge cutoff:

    Generate a photo of the KPop Demon Hunters performing a concert at Golden Gate Park in their concert outfits.

That still fails even with Grounding with Google Search and Image Search enabled, and more charitable variants of the prompt.

tl;dr the example images (https://deepmind.google/models/gemini-image/flash/) seem similar to Nano Banana Pro which is indeed a big quality improvement but even relative to base Nano Banana it's unclear if it justifies a "2" subtitle especially given the increased cost.

Adding to predictions: the magic of travel might actually be reborn, as people seek authentic experiences.

Would be interesting to see latency vs quality tradeoffs here. Are they targeting consumer-facing generation speed or prioritizing fidelity for professional workflows?

It'll be great to find a web directory dedicated exclusively to good/useful prompts with nano banana

The Chinese are so much ahead in this space, their models are way better at this stuff. For example, https://hunyuan.tencent.com/image/en?tabIndex=0 and https://seed.bytedance.com/en/seedream5_0_lite

Did gemini-2.5-flash-image get an upgrade as well? I just got the following, which is fascinating, and not something I've seen before:

> I'm sorry, but I cannot fulfill your request as it contains conflicting instructions. You asked me to include the self-carved markings on the character's right wrist and to show him clutching his electromancy focus, but you also explicitly stated, "Do NOT include any props, weapons, or objects in the character's hands - hands should be empty." This contradiction prevents me from generating the image as requested.

My prompts are automated (e.g. I'm not writing them) and definitely have contained conflicting instructions in the past.

A quick google search on that error doesn't reveal anything either

I’ve been exploring this exact problem space from the angle of extreme constraints (single-digit MB memory, no cloud assumptions). I documented what broke first and why here, in case it’s useful: https://github.com/nullclaw/nullclaw

Just what we need, more sloperators thinking they are being creative and making art by prompting.

I would be happy to never see any more AI slop.

I only needed help of this banana boy twice, it managed to disappoint me each time. The most recent one, I was trying different beard and mustache styles on myself, on a photo I imported from my own Google photo gallery, and it consistently rejected me, claiming I'm a public figure. Nobody ever told me that I look like any famous person, so that's googles own bananination. ChatGPT nicely handled the job.

Any info or speculation about technical details?

Wow the article narration with Umbriel is silent after the 6 second mark.

So, I'd suspect the seedance2.0 competitive video model is coming as well soon? ;)

It's not working very well at all. I started with a picture of a girl sitting at a cafe table and asked it to zoom in, and it enlarged her head to the size of a balloon.

Open weight? How many parameters?

Is this a distillation of Nano Banana Pro?

Still has context leaking into the text/random signs in the image, made worse by generating filler with an internal LLM

does it still break images with transparent pixels?

Can we now edit the images it spits out? All prior tests in trying to edit AI images has failed miserably and laughably

Is it just me, or is Nano banana not working in Gemini currently?

I really wish they opened a version of this up for adult content. They would make immense amounts of money and it could be fenced off behind some sort of paywall where they could verify the age of the person.

It's working pretty well for generating an xkcd comic for your HN profile: https://hn-wrapped.kadoa.com/

Previous nano banana frequently made speech attribution errors, the new one seems a lot more consistent.

It's extremely slow, takes several minutes to generate an image.

My naive question, can image generation make something novel eg. "show me a DNA structure that cures cancer" can it do that, or it has to have seen something before to generate it.

Just think we conceptually know what a brushless motor design looks like and it's just pixels. I guess even if it did produce the image we wouldn't know what it means.

I've only had a brief opportunity to try out NB Pro 2 (`gemini-3.1-flash-image-preview`), so I haven't had a chance to update GenAI Showdown.

Here's some of my captions that tend to trip up even state-of-the-art models.

https://mordenstar.com/other/nb-pro-2-tests

So far it does feel more iterative than an entirely new leap in terms of capabilities, but I haven't run it through the more multimodal aspects such as editing existing images.

That being said, it actually managed the King Louie jump rope test which surprised me.

I love your website, art and projects!