Apple Foundation Models

This is Apple commoditizing LLMs while keeping control of the UX.

They are a hardware company and will keep selling the best machine for AI use. Well done.

> a Swift package that makes Claude available as a server-side language model in Apple's Foundation Models framework

Ahh I was hoping for the opposite: all of the existing features of Claude Code but somehow running locally on my laptop's neural engine. A pipe dream on an M2 with 8 GB of RAM, but I had a flicker of hope there.

This isn't Claude specific. Developers can also write apps that call Google's server based Gemini models.

> At WWDC, Apple announced that it's opening its Foundation Models framework to third-party cloud model providers. Starting with iOS 27, macOS 27, iPadOS 27, visionOS 27 and watchOS 27, model providers can implement the new public LanguageModel protocol to provide a common interface for model inference. We've made Gemini models available to the Foundation Models framework through the Firebase Apple SDK.

This provides a fully native development experience — cloud-hosted Gemini models can plug directly into the Foundation Models framework using the same API. That means the on-device Apple model and cloud-hosted Gemini models sit behind a shared API surface, so you can easily swap between local and cloud inference to fit your use case.

https://blog.google/innovation-and-ai/technology/developers-...

While I'm happy with Apple introducing this abstraction. my main concern was with local models.

I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.

I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).

I didn't see anything suggesting that's the case.

Is this Apple encouraging developers to go through their api abstraction layer to use LLMs so that when they launch their own (which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI?) that they can easily help devs make a seamless transition? Or is it just a developer nicety or something else?

I think this is just Apple planning for their on-device models getting better, which makes sense given they have access to Gemini now. If developers use this for all their code calling an external LLM, then as Apple's model becomes more capable and covers more use cases it'll be easy to switch to it at individual call sites. That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.

First Microsoft has broken keyfabe by putting "Copilot is for entertainment purposes only" in the Copilot terms of use and putting warnings in copilot for excel "avoid using COPILOT for ... any task requiring accuracy or reproducibility ... Tasks with legal, regulatory or compliance implications".

Then Apple quietly refuses to participate by not investing tens or hundreds of billions in creating a competing LLM. Sure, they resell Claude for the marks or utilize Gemini to placate the gullible fools but they know what's up.

https://www.microsoft.com/en-us/microsoft-copilot/for-indivi...

https://support.microsoft.com/en-US/Excel/copilot-function

How can you practically use this in software if you're to deploy this to users? Asking a user to create and enter their own API key is a bar too high for good UX.

> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses.

I know this is from a developer perspective. But as a consumer this is just funny.

Coding agent itself an imposed layer. Now they are adding one more layer? Many times I think of coding agent as the vendor supervisor from the body shops of the 90's who promise the customer everything under the sky and thrash the poor contractor to deliver. Coding agents consume 10x more tokens just like how body shops charged their customers vs how they paid the contractors. For a simple test, the same task that makes the model to go out of context length when used via a coding agent, runs fine when prompted directly.

Layers are luxury and remove control and transparency.

Since Claude is technically a subscription, Apple will slowly weasel their way into skimming 30% of the token spend

Is this included in the free AI tier for small developers? Big news if so

Shared daemon is the only way this makes sense on-device. A 3B model at 4-bit is roughly 2GB - three apps loading their own copies would eat an 8GB phone.

I think Apple has a fairly good plan for supplying a common API and default on device models.

What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?

From app developer standpoint why would anyone ship claude keys like that ... or am I missing something? From consumer standpoint - I guess they can use their own keys but it is not something that is very user friendly as you can imagine.

I’m surprised to see the model names hardcoded as an enum (e.g. `.sonnet4_6`), instead of a string with model discovery so that the user can select their preferred model without having to get a new app version through the App Store to support newer models.

This seems smart. Apple, despite not really leading in AI themselves, are right on the hot path of where developers are going to yolo slop into the ecosystem. Make a tonne of sense to define a nice clean API that places like Anthropic can build on top of and expose to developers.

It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.

So where does the api key reside? You can’t ship it on the iOS client since anyone can read and abuse it

I didn’t understand what they were doing with Apple Foundation Models until this. It made it sound like they were training their own. Good strat tho!

> Usage is billed to your Anthropic account at standard API pricing.

While expected, it’s still a bummer.

Does "Apple Intelligence" need to be Turned On for this as well?

So actually the most successful AI was OpenRouter Intelligence? Pronounced as OÏ.

> A key bundled into an app is extractable from the shipping binary, and anyone who extracts it can make requests billed to your account. Use .apiKey for development only, and switch to a proxy before release.

I don't like this model. Then all the user data is visible to the proxy.

Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.

We just need to live in the alternate universe where micro payments succeeded.

Can someone explain me what it means in the context of Apple and ChatGPT/Claude/Mistral...?

Serious question: this looks like a thin library on an API. Why is it a big deal?

Misleading title. This is about Claude for Apple Foundation Models, not about Apple Foundation Models

I'm not sure if I want to touch anything Anthropic anymore.

What I'm curious about is whether this is actually on-device. Apple's framework caps local models around 3B params last I looked, and Claude is way bigger than that. So either there's some hybrid setup I haven't seen documented, or this is mostly a Claude SDK in FM clothing. Anyone tried it on a plane?

What it is

Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.

  // Apple's on-device model
  let session = LanguageModelSession(model: SystemLanguageModel.default)

  // Claude — same API, just different model constructor
  let session = LanguageModelSession(model: ClaudeLanguageModel(name: .sonnet4_6, auth: auth))

One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.

You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.

Why would I want a nerfed model?

This was expected. Apple will carefully choose what & how people can use AI in their ecosystem and will make sure of it. I hope "Apple Foundation Models" Eco-system grows with support from major model providers.

Layers are luxury and remove control and transparency.

You wouldn't use this when building a coding agent.