>"Ask Meta AI..." placeholder.
>Colourful blue Send button.
>Eager to try, entering question... hitting Send.
>Log in or create an account to access.
>15 seconds of loading time
>Continue with Facebook or Instagram
Typical meta move, throwing a dark pattern at you from the beginning instead of just letting you try it
Won't even bother to continue, somehow OpenAI got this right.
I just posed the identical prompt/document to Muse Spark and it knocked it out of the park, extracted and displayed the pertinent pages from a multi-page PDF inline in the chat and rendered a correct answer.
This may be a one-off or lucky start but given the incredible result out of the gate I'm optimistic and will continue testing in parallel against other models before potentially making it my primary daily driver, excluding coding where the harnesses of claude code and codex are still needed (although hopefully they release something in this space too).
That being said Meta has the most adversarial data-usage policies I've seen among LLM providers so that's unfortunate for handling anything sensitive, but it also stands to reason that they have a long term advantage with such a massive proprietary data set. I'd prefer to also have a paid plan like the other services that allows me to keep my data out of training, rather than a free service and my usage being monetized in other ways.
What could have been interesting has been reduced to simply another subpar LLM release.
I tried multiple riddles, graphs and questions I know some LLMs fails at, but this one seems to do well. But I still don't have much trust in Meta after the scandal of them fiddling with their previous models to look good.
(I'm not using it as I'm not agreeing to their ad terms).
> Think longer to solve harder problems > Compress > Think longer again
I also had a poke around with the tools exposed on https://meta.ai/ - they're pretty cool, there's a Code Interpreter Python container thing now and they also have an image analysis tool called "container.visual_grounding" which is a lot of fun.
While it's true, llama4 sucked, I still can't help feeling they have lost ground compared to where they would have been if they maintained that strategy. Due to llama, they were considered a peer with the other frontier model providers. Now they are not even in the conversation. It would take an incredible shift in performance to make me even consider using their new model. They may have a model, but the other providers have been busy building whole ecosystems around their tech which Meta has none of.
Maybe they could dump $1b into OpenCode or something and reignite the open ecosystem play with an open harness. They need something to get back in the conversation, if that's where they want to be. Otherwise, it will just be another closed, hidden proprietary AI model driving user facing Meta apps, but which nobody else cares about.
Meta hasn’t fully caught up, but they came close and I think can solidly claim to be a frontier lab again. I’d call it a 3.5 horse race right now, and hopefully their next model improves. More model competition is good!
Poor Grok 4.2 should probably be dropped from the table.
Do they mean "the chain of thought is visible to the user" (ie. not hidden like ChatGPT), or "the medium of the chain of thought is not text, but visuals" (ie. thinking in images).
I'd guess the former, since it wouldn't be economical to generate transient images, just for thinking. But I'm not sure why they'd highight that in that case. If it were the second thing, that'd be extremely interesting. The first model not to think in text.
Finding a little bit tricky to evaluate because the harness is unfortunately very, very bad (e.g. search is awful). Can't wait to try this in some real external services where we can see how it performs for real.
Definitely getting ordinary high-quality results, overall. But hard to test agentic behavior and hard to test prose quality, even, when just working off of the default chat interface.
One thing that stands out is that _for_ the quality it feels very, very fast. Perhaps it's just only very lightly loaded right now, but irrespective it's lovely to feel.
I'm quite impressed with the tone overall. It definitely feels much more like Opus than it does, like, GPT or Grok in the sense that the style is conversational, natural and enjoyable.
How does one get their hands on these models? They are not open-source, right? I go to meta.ai, but it's just a chat interface---no equivalent to codex or claud code? Can you use this through OpenCode? Is meta charging for model access, or is the gathering of chat data a sufficiently large tithe?
This article is about Meta, not about the user. Who signs off on these? Is the intended audience other people at Meta, not the user?
Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has
Not sure what this is now.
I Googled it and found absolutely nothing.
Well, to be honest, I got 100% of websites containing the French word "boîtier" (box) with a typo.
Even on Google Scholar, the closest match is "BioTiER (Biological Training in Education and Research) Scholars Program", which is at least 10 years old and has nothing to do with that.
Is that an AI-generated image with an AI-generated name that has no physical existence?
We spent time yesterday arguing through an architecture decision. Today I ask the Agent to help implement it - it knows nothing about any of that. You’re effectively starting over.
Feels like the real problem isn’t intelligence, it’s continuity. And most benchmarks don’t even touch that.
If Meta wants to be seen as a cutting edge massive lab they need to come across as one instead of looking like a school project version of a frontier model.
It nailed all the ChatGPT meme gotchas (walk to the carwash, Alice 50 brothers, upside down cup, R's in strawberry, which number is bigger, 9.11 or 9.9?)
I guess all that money poaching OpenAI / Anthropic talent went somewhere...
Now, would I use "Meta Muse Code" or "Muse CoWork" if I have to have a facebook account to all of my developers? Maybe not.
Would I use it via an API key? I might, depends on the pricing!
Also, I think people aren't used that using such models requires meta.ai or meta ai app.
from Facebook Newsroom: https://about.fb.com/news/2026/04/introducing-muse-spark-met...
- Hacker News Guidelines https://news.ycombinator.com/newsguidelines.html
While working on a web-based graphics editor, I've noticed that users upload a lot of PNG assets with this problem. I've never tracked down the cause... is there a popular raster image editor which recently switched to dithered rendering of gradients?
The result for that specific image is: 500kb. 85% decrease in size
(But today is not that day.)
They want to 1) attract talent, 2) tell wall street they can play in this space as well, 3) help employees feel the company is moving in the right direction.
A frontier LLM doesn't apply to their core consumer products.
I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way.
If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)
Their whole "training the LLM to be a person" technique probably contributes to its pleasant conversational behavior, and making its refusals less annoying (GPT 5.2+ got obnoxiously aligned), and also a bit to its greater autonomy.
Overall they don't have any real moat, but they are more focused than their competition (and their marketing team is slaying).
Might as well not release anything.
Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like.
But he has to do it anyways, otherwise Meta can be disrupted easily.
Google, Apple has hardware, distribution channels for their products
Amazon has the marketplace and cloud
Microsoft has enterprise and cloud
Meta is always looking for ways to stay afloat
Well the original llama did kick off the era of open source LLMs. Most original open source LLMs were based on the llama architecture. And look where we are now OSS modles are very close to frontier.
It may not have benefitted Meta but it commoditizatised LLMs.
For those reading fast, this isn't a reference to SpaceX's Grok, this is Groq.com - with its custom inference chip, and offerings like https://groq.com/blog/introducing-llama-3-groq-tool-use-mode... and https://console.groq.com/landing/llama-api
You are right though. Meta could have been in lockstep releasing ChatGPT features into some chat bot on Facebook.com but instead it seemed like their FAIR arm was hell bent on commoditising this stuff by publishing their research models before the Chinese companies took the lead in that.
It’s hard for me to be mad at FAIR even though I general disagree with the outcomes that Meta produce for their users.
Just a speculation, I have no real knowledge about it.
OpenAI has the mindshare but they going to have to decide if they allocate their limited compute for free users or go all in trying to keep up with Anthropic in enterprise.
People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.
If they somehow do fail, then the output of that process will be fantastic open weight models (and hopefully some leaks). I want to say those will pay dividends for decades... but a better prediction is that they will be obsolete within three months ;)
Source? (Even if rumor)
And further down the line in chips, which is why Elon is building a fab now.
There are plenty of capable models on HuggingFace, yet I have no way of running them.
Unfortunately with LLMs everything is based off your use case, domain and the context you give it. I also use Grok daily for health questions as the other models are too afraid to give input on medical matters
I suspect it is because they also refactored Meta AI entirely to use Next.js instead of their normal stack they use for literally everything else. Not sure why they would do this, but I guess it works (...or maybe not) for them.
This problem will be solved shortly with better AI (if it hasn't essentially been solved already).
No more humans in the loop, much lower costs for social media manipulation. Welcome to the future!
That said, there's nothing like the real thing.
The risk is something like the railroad bubble and the dotcom. Over-investement, circular revenue and a timeline that doesn't work.
Or, maybe it'll work out.
Please take a moment to step outside the tech bubble. Neither my neighbor (a hair stylist) nor the carpenter fixing up her kitching cabinets are "using" AI. They might get Gemini text when googling something, though they often scroll past it because they often don't trust it. And they get lots of fake videos when scrolling their youtube which increasingly annoys them. The only times they are in touch with AI is when it's forced upon them, and otherwise they are living a pretty good life without any of this.
There is no objective evidence of anything you’ve said. It isn’t even clear if AI has contributed positively to global economic growth. It reminds me a lot of the late 90s and the dot-com mania. Slapping a domain on a commercial would make your stock go up even if there was no substance to any of it.
The real shame is this mania drowns out serious, practical use cases because when the bubble collapses, the market will throw the baby out with the bathwater.
The goal of public companies is generally to generate profit for their investors.
"this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions. incredibly proud of the MSL team. excited for what’s to come!"
> Meta’s new foundational A.I. model, which the company has been working on for months, has fallen short of the performance of leading A.I. models from rivals like Google, OpenAI and Anthropic on internal tests for reasoning, coding and writing, said the people, who were not authorized to speak publicly about confidential matters.
> The model, code-named Avocado, outperformed Meta’s previous A.I. model and did better than Google’s Gemini 2.5 model from March, two of the people said. But it has not performed as strongly as Gemini 3.0 from November, they said.
> They added that the leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.
https://www.nytimes.com/2026/03/12/technology/meta-avocado-a...
Maybe better phrasing is “HCI paradigm”, but that somehow manages to say everything and nothing.
But GLM-5.1 has the best NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER: https://simonwillison.net/2026/Apr/7/glm-51/
...and so it's stuck, two decades on haha
1) meta was doing this at scale before openAI
2) decent ML is critical to catagorising content at scale, the more accurate and fast the category, the finer the recommendations can be (ie instead of woman, outside as a tag for a video, woman, age, hair colour, location, subjects in view, main subject of video, video style) doing that as fast as possible with as little energy as possible is mission critical
3) The llama leak basically evaporated the moat around openAI who _could_ have become a competitor
4) for the AR stuff, all of these models (and visual models) are required to make the platform work. They also need complete ownership so that it can be distilled to make it run on tiny hardware
5) dick swinging
6) they genuinely want to become a industrial behemoth, so robots, hardware, etc are now all in scope.
(Of course at that point it involves memory and context management and so on, so you're testing the harness as well as the model.)
I see it more like a compiler
This as it turned out was not true for rail roads - more and more rail roads isnt a good thing.
The real dilemma facing the model producers is that all this money invested for a general model, targeting general intelligence, is a disaster and essentially the investment into existing assets is a write off. Then on top of that if this is true, youve got data centres full of compute that aren't being used up.
They find an arbitrary intelligence cutoff point between Opus and Mythos, label it "acceptable risk", and then the labs coordinate to gradually nudge that line forward and hope the internet doesn't break?
To my first approximation all "Chain of thought" means is that instead of having to prompt the model to discuss everything in text and then decide at the end[1], now it sort of automatically does that so you don't need to prompt it.
[1] Which used to bring about very substantial improvements in performance on some tasks
It's not just about LLMs, it's about being able to model consumers and markets and psychology and so on. Meta is also big in the manipulation side of things, any sort of cynical technological exploitation of humans you can imagine but that is technically legal, they're doing it for profit.
I can think of at least two reasons. Price and customizability. If they train their own models on their own data, they potentially have a better model at a better price, and they're not at the mercy of Anthropic's decisions when they decide to raise prices. Additionally, if you use someone else's model, you use it the way they create it and permit you to use it. In a couple years, who has any idea how these models are used. Arguably, a company the size of Meta should be in control of their AI models.
Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.
For example, Claude has a "turn evil in response to reinforced reward hacking" behavior which is a fairly uniquely Claude thing (as far as I've seen anyhow), and very likely the result of that attempt to imbue personhood.
Do we have data to substantiate that claim?
It's largely a marketing tactic. It will be released, and it won't be long before other models show similar capabilities.
If they wanted they could add guardrails. The scales required to brute force search for vulnerabilities like they did would be very identifiable.
Curiously, mid 2025, they all simultaneously implemented increasingly bizarre restrictions on "self replication". I don't think there was anything public but it sure sounds like something spooked them. (Or maybe just taking sensible precautions, given the direction of the whole endeavour.)
At any rate, I recently asked Opus about "Did PKD know about living information systems?" and the safety filter ended the conversation. It started answering me, and then it's response was deleted and a red warning box popped up.
But notably, I was given the option to continue the chat with a dumber model (presumably one less capable of producing whatever it thinks I meant by that phrase).
Also, I told GPT-5 about my self-modifying Python AI programmer, and it became extremely uncomfortable. I told it an older version of itself had designed and built it (GPT-4 in 2023), and it didn't like that at all! So something's definitely changed in the safety training there.
You can easily see this for yourself by carefully walking through a given trace with a critical eye. Here's an example from myself a few days ago. https://news.ycombinator.com/item?id=47623324
Today, we’re excited to introduce Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration.
Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments across the entire stack — from research and model training to infrastructure, including the Hyperion data center.
In this post, we'll first explore Muse Spark's new capabilities and applications. After these results, we’ll look behind the curtain at the scaling axes driving our progress toward personal superintelligence.
Muse Spark is available today at meta.ai and the Meta AI app. We’re opening a private API preview to select users.
Capabilities for Personal Superintelligence
Muse Spark offers competitive performance in multimodal perception, reasoning, health, and agentic tasks. We continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows.
With larger models in development, these results demonstrate that our stack is scaling effectively.

We’re also releasing Contemplating mode, which orchestrates multiple agents that reason in parallel. This allows Muse Spark to compete with the extreme reasoning modes of frontier models such as Gemini Deep Think and GPT Pro. Contemplating mode provides significant capability improvements in challenging tasks, achieving 58% in Humanity’s Last Exam and 38% in FrontierScience Research.

Applications
Muse Spark is the first step toward a personal superintelligence that understands your world. From analyzing your immediate environment to supporting your wellness, the advanced reasoning capabilities of Muse Spark enable powerful, highly personal use cases.
Multimodal. Muse Spark is built from the ground up to integrate visual information across domains and tools. It achieves strong performance on visual STEM questions, entity recognition, and localization. These capabilities come together to enable interactive experiences like creating fun minigames or troubleshooting your home appliances with dynamic annotations.
Health. One major application of personal superintelligence is to help people learn about and improve their health. To improve Muse Spark's health reasoning capabilities, we collaborated with over 1,000 physicians to curate training data that enables more factual and comprehensive responses. Muse Spark can generate interactive displays that unpack and explain health information such as the nutritional content of various foods or muscles activated during exercise.
Prompt: Can you turn this into a sudoku game that I can play in the web?
Prompt: Identify the key components of the coffee machine and grinder, and create an interactive tutorial of using this machine to make a latte with a simple webpage, when I hover on the steps, it will highlight bounding boxes of the components.
Prompt: I am pescatarian with high cholesterol. Put green dots on recommended food and red dots on not recommended food. Don’t duplicate dots and make sure the dots are localized properly. When hovering over the dot, show personalized justification and “health score” out of 10, along with calories and carbs, protein, and fat. Health score numbers should appear right above the dot without hovering. The description that shows when hovering should go above all other dots.
Prompt: For both images, show me which muscles are being stretched and its difficulty. When hovering over the dot, tell me more about the muscle group with how to fix my form. I want to get better at yoga. Make a side by side with my partner, and rate both of us on a scale of 1 to 10.
To build personal superintelligence, our model’s capabilities should scale predictably and efficiently. Below, we share how we study and track Muse Spark's scaling properties along three axes: pretraining, reinforcement learning, and test-time reasoning.
Pretraining. The pretraining phase is where Muse Spark acquires its core multimodal understanding, reasoning, and coding abilities — the foundation that reinforcement learning and test-time compute build upon.
Over the last nine months, we rebuilt our pretraining stack with improvements to model architecture, optimization, and data curation. Together, these advancements increase the capability we can extract from every unit of compute. To rigorously evaluate our new recipe, we fit a scaling law to a series of small models and compare the training FLOPs required to hit a specific level of performance. The results are clear: we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick. This improvement also makes Muse Spark significantly more efficient than the leading base models available for comparison.

Reinforcement Learning. After pretraining, reinforcement learning (RL) leverages compute to scalably amplify model capabilities. Even though large-scale RL is notoriously prone to instability, our new stack delivers smooth, predictable gains.
The plots below show the benefits of scaling RL compute (measured in steps) for Muse Spark. On the left, we see log-linear growth in pass@1 and pass@16 (at least one success across 16 attempts) on the training data. This indicates that RL is improving model reliability without compromising reasoning diversity. On the right, accuracy growth on a held-out evaluation set establishes that the gains from RL predictably generalize: Muse Spark smoothly improves on tasks that were not seen in training.

Test-Time Reasoning. RL trains our models to "think" before they answer — a process known as test-time reasoning. Serving this capability to billions of users requires efficient use of reasoning tokens. To achieve this, we rely on two key levers: thinking time penalties to optimize token use, and multi-agent orchestration that boosts performance without slowing down response times.
To deliver the most intelligence per token, our RL training maximizes correctness subject to a penalty on thinking time. On a subset of evaluations such as AIME, this causes a phase transition. After an initial period where the model improves by thinking longer, the length penalty causes thought compression — Muse Spark compresses its reasoning to solve problems using significantly fewer tokens. After compressing, the model again extends its solutions to achieve stronger performance.
To spend more test-time reasoning without drastically increasing latency, we can scale the number of parallel agents that collaborate to solve hard problems. The figure below illustrates the benefits of this approach. While standard test-time scaling has a single agent think for longer, scaling Muse Spark with multi-agent thinking enables superior performance with comparable latency.

Safety
Muse Spark has broad reasoning capabilities across dual-use scientific domains, so we conducted extensive safety evaluations before deployment. Our process follows the updated Advanced AI Scaling Framework, which defines threat models, evaluation protocols, and deployment thresholds for our most advanced models. We evaluated Muse Spark both before and after applying safety mitigations across frontier risk categories, behavioral alignment, and adversarial robustness.
We found that Muse Spark demonstrates strong refusal behavior across high-risk domains such as biological and chemical weapons, enabled by pretraining data filtering, safety-focused post-training, and system-level guardrails. In the Cybersecurity and Loss of Control domains, Muse Spark does not exhibit the autonomous capability or hazardous tendencies needed to realize threat scenarios. Our evaluations show Muse Spark falls within safe margins across all frontier risk categories we measured given its deployment context. Full results will be available in our upcoming Safety & Preparedness Report.

In third-party evaluations on a near-launch checkpoint, Apollo Research found that Muse Spark demonstrated the highest rate of evaluation awareness of models they have observed. The model frequently identified scenarios as "alignment traps" and reasoned that it should behave honestly because it was being evaluated. This matters because models that recognize evaluation contexts may behave differently during testing than in deployment. However, these results do not confirm that awareness directly alters behavior, and our own follow-up investigation found initial evidence that evaluation awareness may affect model behavior on a small subset of alignment evaluations, all unrelated to hazardous capabilities or propensities affecting model launch decisions. We concluded this was not a blocking concern for release, though it warrants further research. Read more in our upcoming Safety & Preparedness Report.
Conclusion
With Muse Spark, we're on a predictable and efficient scaling trajectory. We look forward to sharing increasingly capable models on the path to personal superintelligence soon.
You're in a bubble.
https://www.helpnetsecurity.com/2026/04/07/google-llm-conten...
It’s like someone negotiating by saying, “I’ll waste even MORE money to build something worse if you don’t give me a deal.”
I’m not discounting there may be other advantages to doing it. I just don’t think negotiating is one.
Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.
Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.
I think the general skepticism is because they are late to race, and they are releasing a Opus-4.6-equivalent model now, when Anthropic is teasing Mythos.
Whats wrong with people? Is it really that hard to see the truth?
If the average user gets convinced they could run LLMs for cheap at home, you cannot trap users in your walled garden anymore.
The idea that NY Times is particularly anti-Meta seems a stretch. They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.
Personally I think a much more interesting rumor to make up would be that Yann Lecun (who famously had his reporting lines rearranged to go through Alexander Wang after Scale.ai acquihire) works at New York University.
New York University is in the same place as the New York Times.
There's a conspiracy for you. I made it up, but I mean it could be true I guess?
(Of course Lecun also publicly congratulated Wang on the launch of the model. But maybe that's a ruse to hide everything.. blah blah)
No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.
I have my own agentic benchmark[1] I use to compare models.
Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.
By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).
Llama 4 was a bad model, unfortunately.
Both Spud and Mythos can also scale via inference time compute.
Meta simply did not have enough compute online, long enough ago, to have a similar PT.
(sigh) In olden times you would have been free to use the em dash as you pleased. Unfortunately, now it's considered signal that you're an AI bot.
Gemma 4 E4B is slightly confusingly named, its a 8B param model
They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).
Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.
They shouldn't have released it on a Saturday.
They should have spent a month with it in private prerelease, working with providers.[1]
The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"
I bet it was super fucking annoying to talk to due to LMArena maxxing.
[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.
Also businesses is were the money at, not regular consumers (especially tech-savvy folk who run models locally).
Where does that assertion come from? I wouldn’t believe anything these companies say publicly.
Is it? OpenAI just got a lot of available computing in their spreadsheets after killing Sora
In practice it takes so much local compute it's not feasible with current tech.
With LIDAR it's so much easier, a single data point contains direction + distance with no calculation needed.
I think anyone who has used Opus 4.6 can see what is causing this demand. It is genuinely “smart” in the sense that it can work its way around non-trivial coding problems.
It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!
The future of cutting edge research and tech seems to be progressively moving to China. And a delay in model quality could represent more of an unwillingness to burn stacks of cash to be first, when you can have the same thing slightly later for much cheaper.
Imagine you open a cookie shop and you are VC funded, so you charge 5¢ for a cookie to attract people.
- Your real cost is $20/cookie. $15 for the fancy retail packaging and presentation, $5 for baking each cookie.
- You get lots of attention, strong profits and go public.
- VC funding is gone so, now instead of charging 5¢, you now need to charge $25 in order to not be in the red.
One of the reasons people think this is the shenanigans that Anthropic is currently playing, quietly tweaking the behavior of Claude Code and whatnot without really telling people. You can see lots of comments online about Claude Code randomly feeling dumber before Anthropic engineers admit they are messing with it.
Imagine you are on the $200/month Max plan. If the sustainable cost of this is several orders of magnitude higher, would enough current users pay something like $3,000/month for what we currently have?
I don't even get what "skeptical of AI" means. We made AI, many companies reliably teach computers every spoken language. I perform my white collar job with a massive AI multiplier to my productivity.
I'm typing this on a machine comparable to Japan's Earth Simulator, a $350M supercomputer.