Gemini 3.1 flash was actually an amazing model to code with and their 20 dollar AI plans had solid value, but they locked it all behind 429s, needless gatekeeping of clients and poor product differentiation even among internal offerings. Users moved on. To claude for the best product, to OpenAI for the non gatekept API access. It’s hard to bring them back.
> Thanks John for an extraordinary partnership and wonderful collaboration over the past 9 years! What we achieved with AlphaFold changed the world, and showed the field what was possible with AI for science and medicine, lighting the way for how AI can benefit humanity.
Their newest model wasn’t really SOTA. And honestly fable 5 was the most human like model I’d ever tried. It was an incredible jump.
And recently lots of Claude users at r/ClaudeAI are noticing Opus 4.8 has really increased in capability. Not new things but maybe redirected compute. It just feels like one of the best models ever, maybe because the compute that was previously assigned to Fable has been redirected? It feels incredible.
When personal finance is not the bottleneck anymore, the new criteria becomes "vision" and "stacked talent".
Seems like everyone here is easily fooled by the Anthropic hype. After the IPO, Anthropic won't be like the daycare it is today.
Their main competitors are the chinese labs which are racing all their prices down close to $0.
Demis is the CEO of DeepMind, it's completely different.
Jumper.. the AlphaFold team left & made Isomorphic. I was always surprised that Jumper hadn't gone with them.
Extreme investor desire for return on capital investment, and quickly
It also feels like Anthropic is the new Google though. They actually try to not be evil, and are actually at the frontier of new tech.
Is Gemini good at writing code? I am sure it is. But where is their Codex? And no, antigravity isn't it.
Is Gemini good at making visualizations? I am sure it is. But where are artifact or visualise skill in gemini.google.com similar to what's available on claude.ai?
What is an average user going to do raw model capability if the product surface isn't expressive enough?
I've definitely noticed it, at least for doing backend C#/dotnet. Its insanely good, I haven't had to babysit much at all this week.
If he walks the talk, I really do not understand how either OpenAI or Anthropic is going to justify the twelve-digit valuations they are hoping for. They will just be some people who bought a domain name and rented some GPUs.
Not a bad playbook. If you’re important to the company, leave and start your own company. Then play the M&A game and you can clean up nicely.
That was when they realized the deep learning was largely unnecessary, and they could just use their massive compute resources to brute force the problem space.
Proving that we would greatly benefit from using our compute resources for science rather than showing ads, and then we just kept showing ads.
I've been in NLP since the LSTM days and it's hard for me to look at LLMs and not just think they are incredible. It's truly a different level of expressiveness. So much of capabilities research is pointing to LLMs effectively learning a world model.
RLVR is also proving really effective. It is hard for me to imagine a world in the future where LLMs aren't at human level performance across a wide variety of tasks.
I fully acknowledge that current LLM labs have a financial interest in people believing AGI is very near, but from what I'm reading in the literature and seeing myself experimenting with the SOTA models it doesn't seem totally unreasonable.
What evidence are you seeing that makes you confident that AGI in the soon-ish future is a complete myth?
https://artificialanalysis.ai/articles/glm-5-2-is-the-new-le...
The idea of "falling behind" when you can leapfrog each other every six months leads me to believe it has to be more than just "falling behind" for one cycle. It's a culture, process, red tape, focus, or mandate problem of some sort. Something not as easily correctable preparing for next launch.
So maximal safety at all costs is in itself a cost. They can spend billions on AI but that spend is down the toilet if the user bounces because the AI's persona is a relentless politically correct scold.
Thank God. I'd rather companies ship something when engineers say it's actually ready rather than when the suits want something to show on stage to pump their egos and career exposure but turn out to be a massive disappointment covered in fluff.
Although it does feel very embarrassing for Google who invented transformers and has more money than both Anthropic and OpenAI combined, to fall behind them at the LLM race.
The question is, how far ahead will the frontier models be in 6 months? if it's still 6 months, open weights might have a fable equivalent model, and the frontier models will be on upwards towards ... essay, or novel, or bibliography, or whatever the next name is.
Define "better". I guess it depends on what you're using it for. I use it almost daily as an alternative to google search and it's great for that, but I think it's absolute garbage for coding and reasoning.
For questions related to coding, solving Arch Linux and WINE Lutris issues, helping me with MXLinux issues, and wifi issues on an old rooted huawei tablet running LineageOS, it was consistently wrong, constantly giving out confident but outdated or misinformation, or hallucinating stuff while gaslighting me. Every time I would point out it was wrong, it would re-check and keep apologizing and then repeat giving me wrong answers, and then apologising again and so on. It doesn't matter what prompts or jailbreaks you give it to get 3.5 Flash to chew longer on complex problems for better reasoning and accuracy, it just defaults to being lazy and giving you the quick and easy answer from its weights, which can be totally wrong. Same for asking it to write me a cover letter based on my resume and the job description I wanted to apply to. It massively sucked at that too and made up a bunch of unusable fake sounding BS.
Basic free tier ChatGPT 5.5 would blow it out of the water on all of those tasks. Hell, even Grok free is better at that, it gave me a one-shot Arduino code that blew Gemini 3.5 flash away.
3.5 Flash seems tuned to just eyeballing basic answers to general purpose questions that resemble Google searches like "give me a recipe" or "give me a workout plan", or "what's the difference between Arch and Fedora based distros", not to solving complex issues that require cognition and accuracy. That's what the 3.1 Pro is better for according to Gemini. Oh and it is also gaslights you by starting the answers with first telling you how amazing things from your question are, which is insanely annoying but I guess Google's A/B testing found out the majority of Average Joe midwits love it when "the AI" reinforces their choices and decisions like a fake friend.
I think Google just doesn't care about being the SOTA for coding, reasoning and accuracy, since they're in the ads and search business for everyone, not in the agentic coding business for pro-sumers, so if the answers are some hallucinations that sound "good enough" to its clueless search user base, but is at least dirt cheap to run on their datacenter hardware, then it's already more than enough for them and they can all it a day.
Meanwhile OpenAI and Anthropic don't have search and ads monopolies, so they need to perform well at certain task for people and businesses to give them their hard earned money for them to survive. For them, nailing stuff like coding and writing accuracy is an existential threat, not a hobby sideproject like it is for Google.
Google seems more interested in fast models that can quickly turn responses, which kind of fits with a company that needs to serve AI on a mass scale.
It out-performed every model that wasnt a max/ultrafrontier of some sort, except for the one that the article was extolling the virtues of, including grok high. you could make a good argument that deepseek is a better value, but gemini flash is when bundled is already pretty accessible.
nowhere did i claim that flash was better than fable or 5.5xhigh.
I don't care about someone else's charts, i care about my own lived experiences. Benchmarks can be gamed to get to the top of charts. When I pay for a service I care about how it performs in my test cases, not about which tops some random charts.
Read my comment again please. I think I was pretty clear with detailed examples on where Gemini sucks and where it's good at.
>nowhere did i claim that flash was better than fable or 5.5xhigh.
And nowhere did I claim that. I said even basic GPT and Grok are better than Gemini Flash at reasoning tasks. Again, read my comment again, I have already explained why with examples.
Fast answers, using their search as grounding, that can parse keywords and spit out a few ads is where Gemini Flash is going to head. That, and the agentic actions stuff they showed off at I/O with Google shopping, ordering food, etc. Speed is important there.