I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price
It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.
It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.
For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.
They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output
For comparison:
Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output
Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output
Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output
Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)
Gemini 3.0 Pro: $2.00/M for input and $12/M for output
Gemini 2.5 Pro: $1.25/M for input and $10/M for output
Gemini 1.5 Pro: $1.25/M for input and $5/M for output
I think image input pricing went up even more.
Correction: It is a preview model...
With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?
I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.
Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?
Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.
> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.
The replacement for old flash models will be probably the 3.0 flash lite then.
its almost as good as 5.2 and 4.5 but way faster and cheaper
I just always thought the taste of gpt or claude models was more interesting in the professional context and their end user chat experience more polished.
are there obvious enterprise use cases where gemini models shine?
Hoping that the local ones keep progressively up (gemma-line)
> Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.
I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.
Pipe dream right now, but 50 years later? Maybe
Just avoiding/fixing that would probably speed up a good chunk of my own queries.
Image model they have released is much worse than nano banana pro, ghibli moment did not happen
Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding
The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.
Now, imagine for a moment they had also vertically integrated the hardware to do this.
1, has anyone actually found 3 Pro better than 2.5 (on non code tasks)? I struggle to find a difference beyond the quicker reasoning time and fewer tokens.
2, has anyone found any non-thinking models better than 2.5 or 3 Pro? So far I find the thinking ones significantly ahead of non thinking models (of any company for that matter.)
Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.
ChatGPT still has 81% market share as of this very moment, vs Gemini's ~2%, and arguably still provides the best UX and branding.
Everyone and their grandma knows "ChatGPT", who outside developers' bubble has even heard of Gemini Flash?
Yea I don't think that dynamic is switching any time soon.
Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.
The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.
Also, I hate that I cannot send the Google models in a "Thinking" mode like in ChatGPT. When I send GPT 5.1 Thinking on a legal task and tell it to check and cite all sources, it takes +10 minutes to answer, but it did check everything and cite all its sources in the text; whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources, making it impossible to click to check the answer. It makes the whole model unusable for these tasks. (I have the $20 subscription for both)
Skatval is a small local area I live in, so I know when it's bullshitting. Usually, I get a long-winded answer that is PURE Barnum-statement, like "Skatval is a rural area known for its beautiful fields and mountains" and bla bla bla.
Even with minimal thinking (it seems to do none), it gives an extremely good answer. I am really happy about this.
I also noticed it had VERY good scores on tool-use, terminal, and agentic stuff. If that is TRUE, it might be awesome for coding.
I'm tentatively optimistic about this.
I'm more excited to see 3 Flash Lite. Gemini 2.5 Flash Lite needs a lot more steering than regular 2.5 Flash, but it is a very capable model and combined with the 50% batch mode discount it is CHEAP ($0.05/$0.20).
After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.
The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.
Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.
claude is coding model from the start but GPT is in more and more becoming coding model
When I ask Gemini 3 Flash this question, the answer is vague but agency comes up a lot. Gemini thinking is always triggered by a query.
This seems like a higher-level programming issue to me. Turn it into a loop. Keep the context. Those two things make it costly for sure. But does it make it an AGI? Surely Google has tried this?
- "Thinking" is Gemini 3 Flash with higher "thinking_level"
- Prop is Gemini 3 Pro. It doesn't mention "thinking_level" but I assume it is set to high-ish.https://deepmind.google/models/gemini-robotics/
Previous discussions: https://news.ycombinator.com/item?id=43344082
Fast = Gemini 3 Flash without thinking (or very low thinking budget)
Thinking = Gemini 3 flash with high thinking budget
Pro = Gemini 3 Pro with thinking
Dec 17, 2025
Gemini 3 Flash is our latest model with frontier intelligence built for speed that helps everyone learn, build, and plan anything — faster.
Google is releasing Gemini 3 Flash, a fast and cost-effective model built for speed. You can now access Gemini 3 Flash through the Gemini app and AI Mode in Search. Developers can access it via the Gemini API in Google AI Studio, Google Antigravity, Gemini CLI, Android Studio, Vertex AI and Gemini Enterprise.
Summaries were generated by Google AI. Generative AI is experimental.
Summaries were generated by Google AI. Generative AI is experimental.

Your browser does not support the audio element.
Listen to article
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes
Today, we're expanding the Gemini 3 model family with the release of Gemini 3 Flash, which offers frontier intelligence built for speed at a fraction of the cost. With this release, we’re making Gemini 3’s next-generation intelligence accessible to everyone across Google products.
Last month, we kicked off Gemini 3 with Gemini 3 Pro and Gemini 3 Deep Think mode, and the response has been incredible. Since launch day, we have been processing over 1T tokens per day on our API. We’ve seen you use Gemini 3 to vibe code simulations to learn about complex topics, build and design interactive games and understand all types of multimodal content.
With Gemini 3, we introduced frontier performance across complex reasoning, multimodal and vision understanding and agentic and vibe coding tasks. Gemini 3 Flash retains this foundation, combining Gemini 3's Pro-grade reasoning with Flash-level latency, efficiency and cost. It not only enables everyday tasks with improved reasoning, but also is our most impressive model for agentic workflows.
Starting today, Gemini 3 Flash is rolling out to millions of people globally:
Gemini 3 Flash demonstrates that speed and scale don’t have to come at the cost of intelligence. It delivers frontier performance on PhD-level reasoning and knowledge benchmarks like GPQA Diamond (90.4%) and Humanity’s Last Exam (33.7% without tools), rivaling larger frontier models, and significantly outperforming even the best 2.5 model, Gemini 2.5 Pro, across a number of benchmarks. It also reaches state-of-the-art performance with an impressive score of 81.2% on MMMU Pro, comparable to Gemini 3 Pro.

In addition to its frontier-level reasoning and multimodal capabilities, Gemini 3 Flash was built to be highly efficient, pushing the Pareto frontier of quality vs. cost and speed. When processing at the highest thinking level, Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro, as measured on typical traffic, to accurately complete everyday tasks with higher performance.
Gemini 3 Flash pushes the Pareto frontier on performance vs. cost and speed.

Gemini 3 Flash’s strength lies in its raw speed, building on the Flash series that developers and consumers already love. It outperforms 2.5 Pro while being 3x faster (based on Artificial Analysis benchmarking) at a fraction of the cost. Gemini 3 Flash is priced at $0.50/1M input tokens and $3/1M output tokens (audio input remains at $1/1M input tokens).
Gemini 3 Flash is made for iterative development, offering Gemini 3’s Pro-grade coding performance with low latency — it’s able to reason and solve tasks quickly in high-frequency workflows. On SWE-bench Verified, a benchmark for evaluating coding agent capabilities, Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.
Gemini 3 Flash’s strong performance in reasoning, tool use and multimodal capabilities is ideal for developers looking to do more complex video analysis, data extraction and visual Q&A, which means it can enable more intelligent applications — like in-game assistants or A/B test experiments — that demand both quick answers and deep reasoning.
We’ve received a tremendous response from companies using Gemini 3 Flash. Companies like JetBrains, Bridgewater Associates, and Figma are already using it to transform their businesses, recognizing how its inference speed, efficiency and reasoning capabilities perform on par with larger models. Gemini 3 Flash is available today to enterprises via Vertex AI and Gemini Enterprise.
Gemini 3 Flash is now the default model in the Gemini app, replacing 2.5 Flash. That means all of our Gemini users globally will get access to the Gemini 3 experience at no cost, giving their everyday tasks a major upgrade.
Because of Gemini 3 Flash’s incredible multimodal reasoning capabilities, you can use it to help you see, hear and understand any type of information faster. For example, you can ask Gemini to understand your videos and images and turn that content into a helpful and actionable plan in just a few seconds.
Or you can quickly build fun, useful apps from scratch using your voice without prior coding knowledge. Just dictate to Gemini on the go, and it can transform your unstructured thoughts into a functioning app in minutes.
Gemini 3 Flash is also starting to roll out as the default model for AI Mode in Search with access to everyone around the world.
Building on the reasoning capabilities of Gemini 3 Pro, AI Mode with Gemini 3 Flash is more powerful at parsing the nuances of your question. It considers each aspect of your query to serve thoughtful, comprehensive responses that are visually digestible — pulling real-time local information and helpful links from across the web. The result effectively combines research with immediate action: you get an intelligently organized breakdown alongside specific recommendations — at the speed of Search.
This shines when tackling complex goals with multiple considerations like trying to plan a last-minute trip or learning complex educational concepts quickly.
Gemini 3 Flash is available now in preview via the Gemini API in Google AI Studio, Google Antigravity, Vertex AI and Gemini Enterprise. You can also access it through other developer tools like Gemini CLI and Android Studio. It’s also starting to roll out to everyone in the Gemini app and AI Mode in Search, bringing fast access to next-generation intelligence at no cost.
We’re looking forward to seeing what you bring to life with this expanded family of models: Gemini 3 Pro, Gemini 3 Deep Think and now, Gemini 3 Flash.
Summarize recent working arxiv url
And then it tells me the date is from the future and it simply refuses to fetch the URL.
so they get lapped a few times and then drop a fantastic new model out of nowhere
the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc
they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options
thinkingConfig: { thinkingLevel: "low", }
More about it here https://ai.google.dev/gemini-api/docs/gemini-3#new_api_featu...
https://artificialanalysis.ai/evaluations/omniscience
Prepare to be amazed
More experts with a lower pertentage of active ones -> more sparsity.
So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.
You say good enough. Great, but what if I as a malicious person were to just make a bunch of internet pages containing things that are blatantly wrong, to trick LLMs?
Kara Swisher recently compared OpenAI to Netscape.
the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.
https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...
The most terrifying thing would be Google expanding its free tiers.
I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.
Good work!
You should do a "show HN" if you're not worried about it costing you too much.
Might be using flash for my MCP research/transcriber/minor tasks modl over haiku, now, though (will test of course)
This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.
Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?
Then you realise you aren't imagining it.
Just do it.
I use a service where I have access to all SOTA models and many open sourced models, so I change models within chats, using MCPs eg start a chat with opus making a search with perplexity and grok deepsearch MCPs and google search, next query is with gpt 5 thinking Xhigh, next one with gemini 3 pro, all in the same conversation. It's fantastic! I can't imagine what it would be like again to be locked into using one (or two) companies. I have nothing to do with the guys who run it (the hosts from the podcast This day in AI, though if you're interested have a look in the simtheory.ai discord.
I don't know how people use one service can manage...
Definitely has not been my experience using 3 Pro in Gemini Enterprise - in fact just yesterday it took so long to do a similar task I’d thought something was broken. Nope, just re-chrcking a source
Maybe someday future models will all behave similarly given the same prompt, but we're not quite there yet
Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).
You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.
Opus and Sonnet are slower than Haiku. For lots of less sophisticated tasks, you benefit from the speed.
All vendors do this. You need smaller models that you can rapid-fire for lots of other reasons than vibe coding.
Personally, I actually use more smaller models than the sophisticated ones. Lots of small automations.
For example, the Gemini 3 Pro collection: https://blog.google/products/gemini/gemini-3-collection/
But having everything linked at the bottom of the announcement post itself would be really great too!
It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.
The merger happened in April 2023.
Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.
Well worth every penny now
Just tried once again with the exact same prompt: GPT-5.1-Thinking took 12m46s and Gemini 3.0 Pro took about 20 seconds. The latter obviously has a dramatically worse answer as a result.
(Also, the thinking trace is not in the correct language, and doesn't seem to show which sources have been read at which steps- there is only a "Sources" tab at the end of the answer.)
But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.
And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)
Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.
Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.
Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.
Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.
Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.
This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".
Google is great on the data science alone, every thing else is an after thought
>Fast = 3 Flash
>Thinking = 3 Flash (with thinking)
>Pro = 3 Pro (with thinking)
My logic test and trying to get an agent to develop a certain type of CRDT implementations (that is published and thus the model is trained on to some limited extent) really stress test models, 5.2 is a complete failure of overfitting.
Really really bad in an unrecoverable infinite loop way.
It helps when you have existing working code that you know a model can't be trained on.
I don't view this as a "new Flash" but as "a much cheaper Gemini 3 Pro/GPT-5.2"
In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.
You might have found models, prompts or workflows that work for you though, I'm interested.
We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.
Now very few people use Snapchat, and it has been reduced to a footnote in history.
If you think I'm exaggerating, that just proves my point.
Just go outside the bubble plus take a bit older people
Of course they are. Founders get fired all the time. As often as non-founder CEOs purge competition from their peers.
> The only others they are beholden too are their co-founders, and in some cases major investor groups
This describes very few successful executives. You can have your co-founders and investors on board, if your talent and customers hate you, they’ll fuck off.
"And then imagine Google designing silicon that doesn’t trail the industry."
I'm def not a Google stan generally, but uh, have you even been paying attention?