It isnt clear from the article whether the time they quote is time-to-first-token or time to completion. If it is latter, then it makes sense why gemini* would take longer even with similar token throughput.
https://ghostarchive.org/archive/JlE5T
https://web.archive.org/web/20250812172455/https://every.to/...
Gemini has done this in ways that I haven't seen in the recent or current generation models from OpenAI or Anthropic.
It really surprised me that Gemini performs so well in multi-turn benchmarks, given that tendency.
Claude Sonnet 4 now supports 1M tokens of context - https://news.ycombinator.com/item?id=44878147 - Aug 2025 (160 comments)
[1]: https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870
Using the GPT-4 tokenizer (cl100k_base) yields 349,371 tokens.
Recent Google and Anthropic models do not have local tokenizers and ridiculously make you call their APIs to do it, so no idea about those.
Just thought that was interesting.
https://ai.google.dev/gemini-api/docs/models (context window is details under model variant section with + signs)
They were meant to crank 2.5 to 2 million at some point though, maybe waiting now till 3?
I assumed because I’m on paid tiers it would still cost behind a certain usage amount, but I guess not.
I’m using Claude Pro for daily driver and Gemini / ChatGPT free tiers.
Not on ai studio.
There is some information that you assume to have shared that we are not picking up on.