Gemma 4 on iPhone

Impressive model, for sure. I've been running it on my Mac, now I get to have it locally in my iPhone? I need to test this. Wait, it does agent skills and mobile actions, all local to the phone? Whaaaat? (Have to check out later! Anyone have any tips yet?)

I don't normally do the whole "abliterated" thing (dealignment) but after discovering https://github.com/p-e-w/heretic , I was too tempted to try it with this model a couple days ago (made a repo to make it easier, actually) https://github.com/pmarreck/gemma4-heretical and... Wow. It worked. And... Not having a built-in nanny is fun!

It's also possible to make an MLX version of it, which runs a little faster on Macs, but won't work through Ollama unfortunately. (LM Studio maybe.)

Runs great on my M4 Macbook Pro w/128GB and likely also runs fine under 64GB... smaller memories might require lower quantizations.

I specifically like dealigned local models because if I have to get my thoughts policed when playing in someone else's playground, like hell am I going to be judged while messing around in my own local open-source one too. And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

Note: I tried to hook this one up to OpenClaw and ran into issues

To answer the obvious question- Yes, this sort of thing enables bad actors more (as do many other tools). Fortunately, there are far more good actors out there, and bad actors don't listen to rules that good actors subject themselves to, anyway.

This app is cool and it showcases some use cases, but it still undersells what the E2B model can do.

I just made a real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B. I posted it on /r/LocalLLaMA a few hours ago and it's gaining some traction [0]. Here's the repo [1]

I'm running it on a Macbook instead of an iPhone, but based on the benchmark here [2], you should be able to run the same thing on an iPhone 17 Pro.

[0] https://www.reddit.com/r/LocalLLaMA/comments/1sda3r6/realtim...

[1] https://github.com/fikrikarim/parlor

[2] https://huggingface.co/litert-community/gemma-4-E2B-it-liter...

This is awesome!

1) I am able to run the model on my iPhone and get good results. Not as good as Gemini in the cloud, but good.

2) I love the “mobile actions” tool calls that allow the LLM to turn on the flashlight, open maps, etc. It would be fun if they added Siri Shortcuts support. I want the personal automation that Apple promised but never delivered.

3) I am so excited for local models to be normalized. I build little apps for teachers and there are stringent privacy laws involved that mean I strongly prefer writing code that runs fully client-side when possible. When I develop apps and websites, I want easy API access to on-device models for free. I know it sort of exists on iOS and Chrome right now, but as far as I’m aware it’s not particularly good yet.

OP Here. It is my firm belief that the only realistic use of AI in the future is either locally on-device for almost free, or in the cloud but way more expensive then it is today.

The latter option will only bemusedly for tasks that humans are more expensive or much slower in.

This Gemma 4 model gives me hope for a future Siri or other with iPhone and macOS integration, “Her” (as in the movie) style.

Is it me or does the App Store website look... fake? The text in the header ("Productiviteit", "Alleen voor iPhone") looks pixelated, like it was edited on Paint, the header background is flickering, the app icon and screenshots are very low quality, the title of the website is incomplete ("App Store voor iPho...")

Nice! Tried on iPhone 16 pro with 30 TPS from Gemma-4-E2B-it model.

Although the phone got considerably hot while inferencing. It’s quite an impressive performance and cannot wait to try it myself in one of my personal apps.

My son just started using 2B on his Android. I mentioned that it was an impressively compact model and next thing I knew he had figured out how to use it on his inexpensive 2024 Motorolla and was using it to practice reading and writing in foreign languages.

These new models are very impressive. There should be a massive speedup coming as well, AI Edge Gallery is running on GPU, but NPUs in recent high end processors should be much faster. A16 chip for example (Macbook Neo and iphone 16 series) has 35 TOPS of Neural Engine vs 7 TFLOPS gpu. Similar story for Qualcomm.

Are these models open source? If so this is Google’s attempt to collect user data from their models.

My iPhone 13 can’t run most of these models. A decent local LLM is one of the few reasons I can imagine actually upgrading earlier than typically necessary.

It doesn’t render Markdown or LaTeX. The scrolling is unusable during generation. E4B failed to correctly account for convection and conduction when reasoning about the effects of thermal radiation (31b was very good). After 3 questions in a session (with thinking) E4B went off the rails and started emitting nonsense fragment before the stated token limit was hit (unless it isn’t actually checking).

Most of the models are not available. I’m guessing they will become available soon enough… At least I hope.

E4B is pretty good for extracting tables of items from receipt scans and inferring categories, wish this could be called from within a shortcut to just select a photo and add the extracted table to the clipboard

It would be very helpful if the chat logs could (optionally) be retained.

I see a phenomenal opportunity for old phone re-use by arraying them in some dock and making them be my "home AI."

I think with this google starts a new race- best local model that runs on phones.

Gemma 4 is great: https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...

I assume it is the 26B A4B one, if it runs locally?

I'm able to sweet talk the gemma-4-e2b-it model in an iphone 15 to solve a hcaptcha screenshot. This small model is surprisingly very capable!

Gemma 4 E4B is an incredible model for doing all the home assistant stuff I normally just used Qwen3.5 35BA4B + Whisper while leaving me with wayy more empty vram for other bullshit. It works as a drop in replacement for all of my "turn the lights off" or "when's the next train" type queries and does a good job of tool use. This is the really the first time vramlets get a model that's reliably day to day useful locally.

I'm curious/worried about the audio capability, I'm still using Whisper as the audio support hasn't landed in llama.cpp, and I'm not excited enough to temporarily rewire my stuff to use vLLM or whatever their reference impl is. The vision capabilities of Gemma are notably (thus far, could be impl specific issues?) much much worse than Qwen (even the big moe and dense gemma are much worse), hopefully the audio is at least on par with medium whisper.

Would it work locally on a Mac Pro M4 24gb? If so I'd really appreciate a step-by-step guide.

That's a great project! I just wondered whether Google would have a problem with you using their trademark

How do these compare to Apple's Foundation Models, btw?

How new of an iPhone model is needed?

I recently got to a first practical use of it. I was on a plane, filling landing card (what a silly thing these are). I looked up my hotel address using qwen model on my iPhone 16 Pro. It was accurate. I was quite impressed.

After some back and forth the chat app started to crash tho, so YMMV.

It’s gotta be free!?!? Right!?!? Oh oh wait

Nice! Tried on iPhone 16 pro with 30 TPS from Gemma-4-E2B-it model.

Although the phone got considerably hot while inferencing. It’s quite an impressive performance and cannot wait to try it myself in one of my personal apps.

My iPhone 13 can’t run most of these models. A decent local LLM is one of the few reasons I can imagine actually upgrading earlier than typically necessary.

It would be very helpful if the chat logs could (optionally) be retained.

How new of an iPhone model is needed?

After some back and forth the chat app started to crash tho, so YMMV.

It's also possible to make an MLX version of it, which runs a little faster on Macs, but won't work through Ollama unfortunately. (LM Studio maybe.)

Runs great on my M4 Macbook Pro w/128GB and likely also runs fine under 64GB... smaller memories might require lower quantizations.

Note: I tried to hook this one up to OpenClaw and ran into issues

I run mlx models with omlx[1] on my mac and it works really well.

[1] https://github.com/jundot/omlx

> And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

I checked the abliterate script and I don't yet understand what it does or what the result is. What are the conversations this enables?

I tried it on my mac, for coding, and I wasn't really impressed compared to Qwen.

I guess there are things it's better at?

Haven't built anything on the agent skills platform yet, but it's pretty cool imo.

On Android the sandbox loads an index.html into a WebView, with standardized string I/O to the harness via some window properties. You can even return a rendered HTML page.

Definitely hacked together, but feels like an indication of what an edge compute agentic sandbox might look like in future.

>there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now.

Mind giving us a few of the examples that you plan to run in your local LLM? I am curious.

This app is cool and it showcases some use cases, but it still undersells what the E2B model can do.

I just made a real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B. I posted it on /r/LocalLLaMA a few hours ago and it's gaining some traction [0]. Here's the repo [1]

I'm running it on a Macbook instead of an iPhone, but based on the benchmark here [2], you should be able to run the same thing on an iPhone 17 Pro.

[0] https://www.reddit.com/r/LocalLLaMA/comments/1sda3r6/realtim...

[1] https://github.com/fikrikarim/parlor

[2] https://huggingface.co/litert-community/gemma-4-E2B-it-liter...

Parlor is so cool, especially since you’re offering it for free. And a great use case for local LLMs.

This is awesome!

1) I am able to run the model on my iPhone and get good results. Not as good as Gemini in the cloud, but good.

For me the hallucination and gaslighting is like taking a step back in time a couple of years. It even fails the “r’s in strawberry” question. How nostalgic.

It’s very impressive that this can run locally. And I hope we will continue to be able to run couple-year-old-equivalent models locally going forward.

OP Here. It is my firm belief that the only realistic use of AI in the future is either locally on-device for almost free, or in the cloud but way more expensive then it is today.

The latter option will only bemusedly for tasks that humans are more expensive or much slower in.

This Gemma 4 model gives me hope for a future Siri or other with iPhone and macOS integration, “Her” (as in the movie) style.

> or in the cloud but way more expensive then it is today.

Why? It's widely understood that the big players are making profit on inference. The only reason they still have losses is because training is so expensive, but you need to do that no matter whether the models are running in the cloud or on your device.

If you think about it, it's always going to be cheaper and more energy-efficient to have dedicated cloud hardware to run models. Running them on your phone, even if possible, is just going to suck up your battery life.

If you can run free models on consumer devices why do you think cloud providers cannot do the same except better and bundled with a tone of value worth paying?

A local model running on a phone owned and controlled by the vendor is still not really exciting, imho.

It may be physically "local" but not in spirit.

this is not that first step towards your dream

Did you really watch “Her” and think this is a future that should happen??

Seriously????

Here's the US version of the same page: https://apps.apple.com/us/app/google-ai-edge-gallery/id67496...

The design quality is still poor. But that's the new Apple. Design is no longer one of their core strengths.

It's the dutch version, see /nl/ in the url.

If you just go to https://apps.apple.com/ it does look better, but I agree, still a bit "off".

Issues caused by a low effort localization?

On my iPhone it opens on the App Store app, so it looks fine to me.

What browser are you using? I don't see any of this behavior on Firefox...

Everything renders crystal clear with Firefox on GrapheneOS.

Nothing weird on my side

That’s nuts actually for such a low power chip. Can’t wait to see the M series version of that.

I’m sure very fast TPUs in desktops and phones are coming.

I think with this google starts a new race- best local model that runs on phones.

I wonder why the cut off date for 3n-E4B-it is Oct, 2023. That's really far in the past.

Gemma 4 is great: https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...

I assume it is the 26B A4B one, if it runs locally?

That's a great project! I just wondered whether Google would have a problem with you using their trademark

This is an app published by Google itself

How do these compare to Apple's Foundation Models, btw?

So much better. Hard to quantify, but even the small Gemma 4 models have that feels-like-ChatGPT magic that Apple's models are lacking.

AFM had a 4096 token context window and this can be configured to have a 32k+ token context window, for one.

Haven't built anything on the agent skills platform yet, but it's pretty cool imo.

On Android the sandbox loads an index.html into a WebView, with standardized string I/O to the harness via some window properties. You can even return a rendered HTML page.

Definitely hacked together, but feels like an indication of what an edge compute agentic sandbox might look like in future.

If you can run free models on consumer devices why do you think cloud providers cannot do the same except better and bundled with a tone of value worth paying?

I wonder why the cut off date for 3n-E4B-it is Oct, 2023. That's really far in the past.

No, only E2B and E4B.

This is an app published by Google itself

So much better. Hard to quantify, but even the small Gemma 4 models have that feels-like-ChatGPT magic that Apple's models are lacking.

AFM had a 4096 token context window and this can be configured to have a 32k+ token context window, for one.

Could you clarify what you mean by 'open-ended' in this context, since both initiatives are essentially open-source?

I run mlx models with omlx[1] on my mac and it works really well.

[1] https://github.com/jundot/omlx

Holy hell, how new is this? I've never heard of it, looks great!

I checked the abliterate script and I don't yet understand what it does or what the result is. What are the conversations this enables?

LLMs are very helpful for transcribing handwritten historical documents, but sometimes those documents contain language/ideas that a perfectly aligned LLM will refuse to output. Sometimes as a hard refusal, sometimes (even worse) by subtly cleaning up the language.

In my experience the latest batch of models are a lot better at transcribing the text verbatim without moralizing about it (i.e. at "understanding" that they're fulfilling a neutral role as a transcriber), but it was a really big issue in the GPT-3/4 era.

1) Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).

2) Asking questions about sketchy things. Simply asking should not be censored.

3) I don't use it for this, but porn or foul language.

4) Imitating or representing a public figure is often blocked.

5) Asking security-related questions when you are trying to do security.

6) For those who have had it, people who are trying to use AI to deal with traumatic experiences that are illegal to even describe.

Many other instances.

Realistically, a lot of people do this for porn.

In my experience, though, it's necessary to do anything security related. Interestingly, the big models have fewer refusals for me when I ask e.g. "in <X> situation, how do you exploit <Y>?", but local models will frequently flat out refuse, unless the model has been abliterated.

The in-ter-net is for porn

I tried it on my mac, for coding, and I wasn't really impressed compared to Qwen.

I guess there are things it's better at?

You're comparing apples to oranges there. Qwen 3.5 is a much larger model at 397B parameters vs. Gemma's 31B. Gemma will be better at answering simple questions and doing basic automation, and codegen won't be it's strong suit.

Mind giving us a few of the examples that you plan to run in your local LLM? I am curious.

I'm not sure what you're angling at but I already gave a set of questions that are ethically legitimate yet routinely censored by the public models:

https://news.ycombinator.com/item?id=47654013

Not to mention that doing what the big model makers do literally dumbs the model down.

They should at least allow something like letting you prove your age and identity to give you access to better/unaligned models, maybe even requiring a license of some sort. Because you know what? SOMEONE in there absolutely has access to the completely uncensored versions of the latest models.

Parlor is so cool, especially since you’re offering it for free. And a great use case for local LLMs.

Thanks! Although, I can't claim any credit for it. I just spent a day gluing what other people have built. Huge props to the Gemma team for building an amazing model and also an inference engine that's focused for edge devices [0]

[0] https://github.com/google-ai-edge/LiteRT-LM

For me the hallucination and gaslighting is like taking a step back in time a couple of years. It even fails the “r’s in strawberry” question. How nostalgic.

It’s very impressive that this can run locally. And I hope we will continue to be able to run couple-year-old-equivalent models locally going forward.

I haven't seen anybody else post it in this thread, but this is running on 8GB of RAM. It's not the full Gemma 4 32B model. It's a completely different thing from the full Gemma 4 experience if you were running the flagship model, almost to the point of being misleading.

It's their E2B and E4B variants (so 2B and 4B but also quantized)

https://ai.google.dev/gemma/docs/core/model_card_4#dense_mod...

Strangely, reasoning is not on by default. If you enable it, it answers as you'd expect.

> or in the cloud but way more expensive then it is today.

> It's widely understood that the big players are making profit on inference.

This is most definitely not widely understood. We still don't know yet. There's tons of discussions about people disagreeing on whether it really is profitable. Unless you have proof, don't say "this is widely understood".

The big players are plausibly making profits on raw API calls, not subscriptions. These are quite costly compared to third-party inference from open models, but even setting that up is a hassle and you as a end user aren't getting any subsidy. Running inference locally will make a lot of sense for most light and casual users once the subsidies for subscription access cease.

Also while datacenter-based scaleout of a model over multiple GPUs running large batches is more energy efficient, it ultimately creates a single point of failure you may wish to avoid.

> It's widely understood that the big players are making profit on inference.

If you add in the cost of training, it’s not profitable.

Not including the cost of training is a bit like saying the only cost of a cup of coffee is the paper cup it’s in. The only way OpenAI gets to charge for inference is by selling a product people can’t get elsewhere for much cheaper, which means billions in R&D costs. But because of competition, each model effectively has a “shelf life”.

Laptop/desktop could work. Most systems are on charger most of time anyway

> It's widely understood that the big players are making profit on inference.

Are they? Or are they just saying that to make their offerings more attractive to investors?

Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.

Locally running models that are snappy and mostly as capable as current sota models would be a dream. No internet connection required, no payment plans or relying on a third party provider to do your job. No privacy concerns. Etc etc.

> It's widely understood that the big players are making profit on inference.

I love the whole “they are making money if you ignore training costs” bit. It is always great to see somebody say something like “if you look at the amount of money that they’re spending it looks bad, but if you look away it looks pretty good” like it’s the money version of a solar eclipse

Are these models open source? If so this is Google’s attempt to collect user data from their models.

[0] https://github.com/google-ai-edge/LiteRT-LM

> It's widely understood that the big players are making profit on inference.

Also while datacenter-based scaleout of a model over multiple GPUs running large batches is more energy efficient, it ultimately creates a single point of failure you may wish to avoid.

Laptop/desktop could work. Most systems are on charger most of time anyway

A local model running on a phone owned and controlled by the vendor is still not really exciting, imho.

It may be physically "local" but not in spirit.

this is not that first step towards your dream

Here's the US version of the same page: https://apps.apple.com/us/app/google-ai-edge-gallery/id67496...

The design quality is still poor. But that's the new Apple. Design is no longer one of their core strengths.

It's the dutch version, see /nl/ in the url.

If you just go to https://apps.apple.com/ it does look better, but I agree, still a bit "off".

Issues caused by a low effort localization?

On my iPhone it opens on the App Store app, so it looks fine to me.

Everything renders crystal clear with Firefox on GrapheneOS.

Nothing weird on my side

Most of the models are not available. I’m guessing they will become available soon enough… At least I hope.

I see a phenomenal opportunity for old phone re-use by arraying them in some dock and making them be my "home AI."

It’s gotta be free!?!? Right!?!? Oh oh wait

I'm able to sweet talk the gemma-4-e2b-it model in an iphone 15 to solve a hcaptcha screenshot. This small model is surprisingly very capable!

How is Google going to collect user data from a locally running model?

Because that's Gemma 3, not 4.

1) Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).

2) Asking questions about sketchy things. Simply asking should not be censored.

3) I don't use it for this, but porn or foul language.

4) Imitating or representing a public figure is often blocked.

5) Asking security-related questions when you are trying to do security.

6) For those who have had it, people who are trying to use AI to deal with traumatic experiences that are illegal to even describe.

Many other instances.

> Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).

When’s the last time you tried this? ChatGPT and Gemini have no trouble responding with all the common criticisms of Islam.

The manufacturing of biologics can be heavily censored to an absurd degree. I don’t know about Gemma 4 in particular.

Realistically, a lot of people do this for porn.

From what I've seen gemma 4 doesn't refuse a lot regarding sex, it only needs little nudging in the right direction sometimes.

But it does refuse being critical of the usual topics: israel, islam, trans, or race.

So wanting to discuss one of those is the real reason people would use an uncensored model.

The in-ter-net is for porn

that song is going to be stuck in my head all day now. lol

Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)

Gemma 4 31B is still not impressive at coding compare to even Qwen 3.5 27B. It's just not its strong suit.

So far gemma 4 seems excellent at role playing, document analysis, and decent at making agentic decisions.

> It's widely understood that the big players are making profit on inference.

Are they? Or are they just saying that to make their offerings more attractive to investors?

Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.

> Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.

Where on earth do people get this idea? Subscriptions that are based around obscure, vendor defined "credits" are the perfect business model for vendors. They can change the amount you can use whenever they want.

It's likely they occasionally make a loss on some users but in general they are highly profitable for AI companies:

> Anthropic last month projected it would generate a 40% gross profit margin from selling AI to businesses and application developers in 2025

and

> OpenAI projected a gross margin of around 46% in 2025, including inference costs of both paying and nonpaying ChatGPT users.

https://archive.is/aKFYZ#selection-1075.0-1083.119

You can pick models that are snappy, or models that are as capable as SOTA. You don't really get both unless you spend extremely unreasonable amounts of money on what is essentially a datacenter-scale inference platform of your own, meant to service hundreds of users at once. (I don't care how many agent harnesses you spin up at once, you aren't going to get the same utilization as hundreds of concurrent users.)

This assessment might change if local AI frameworks start working seriously on support for tensor-parallel distributed inference, then you might get away with cheaper homelab-class hardware and only mildly unreasonable amounts of money.

> It's widely understood that the big players are making profit on inference.

The reason it matters is that if they are making a profit on inference, then when people use their services more, it cuts their losses. They might even break even eventually and start making a profit without raising the price.

But if they're losing money on inference, they will lose more money when people use their services more. There's no way to turn that around at that price.

Did you really watch “Her” and think this is a future that should happen??

Seriously????

What browser are you using? I don't see any of this behavior on Firefox...

That’s nuts actually for such a low power chip. Can’t wait to see the M series version of that.

I’m sure very fast TPUs in desktops and phones are coming.

Would it work locally on a Mac Pro M4 24gb? If so I'd really appreciate a step-by-step guide.

Holy hell, how new is this? I've never heard of it, looks great!

I don’t think OP’s point has anything to do with AI companions.

The big benefit of moving compute to edge devices is to distribute the inference load on the grid. Powering and cooling phones is a lot easier than powering and cooling a datacenter

What does what they said have anything to do with Her? Local LLMs are better than big corporations owning your data and offering LLMs for a huge cost.

Torment Nexus sounds fun

Having Scarlett Johansson's voice might not be so bad or even something less robotic.

Unfortunately, one man's dystopia is another's utopia.

Firefox on Windows, but it looks about the same in Edge

Screenshot of the header: https://i.imgur.com/4abfGYF.png

Firefox on Android: 'Google AI' (in app name) is clipped off the top; the Apple 'share' button is clipped on the bottom.

The Apple Silicon in the MacBook Neo is effectively a slimmed down version of M4, which is already out and has a very similar NPU (similar TFLOPS rating). It's worth noting however that the TFLOPS rating for Apple Neural Engine is somewhat artificial, since e.g. the "38 TFLOPS" in the M4 ANE are really 19 TFLOPS for FP16-only operation.

These E2B and E4B models are very small so that they can fit into phones with around 8gb of RAM. You can get away with a much larger model. Just run:

    brew install ollama 

    ollama run gemma4:26b-a4b-it-q4_K_M

It’s completely vibe coded, doesn’t even run on my Mac lol

I don’t think OP’s point has anything to do with AI companions.

The big benefit of moving compute to edge devices is to distribute the inference load on the grid. Powering and cooling phones is a lot easier than powering and cooling a datacenter

Torment Nexus sounds fun

Productiviteit

Alleen voor iPhone

Gratis · Ontworpen voor iPhone. Niet geverifieerd voor macOS.

iPhone

AI Edge Gallery is the premier destination for running the world’s most powerful open-source Large Language Models (LLMs) on your mobile device. Experience high-performance Generative AI directly on your hardware—fully offline, private, and lightning-fast. Now Featuring: Gemma 4 This update brings official support for the newly released Gemma 4 family. As the centerpiece of this release, Gemma 4 allows you to test the cutting edge of on-device AI. Experience advanced reasoning, logic, and creative capabilities without ever sending your data to a server. Core Features - Agent Skills: Transform your LLM from a conversationalist into a proactive assistant. Use the Agent Skills tile to augment model capabilities with tools like Wikipedia for fact-grounding, interactive maps, and rich visual summary cards. You can even load modular skills from a URL or browse community contributions on GitHub Discussions. - AI Chat with Thinking Mode: Engage in fluid, multi-turn conversations and toggle the new Thinking Mode to peek "under the hood." This feature allows you to see the model’s step-by-step reasoning process, which is perfect for understanding complex problem-solving. Note: Thinking Mode currently works with supported models, starting with the Gemma 4 family. - Ask Image: Use multimodal power to identify objects, solve visual puzzles, or get detailed descriptions using your device’s camera or photo gallery. - Audio Scribe: Transcribe and translate voice recordings into text in real-time using high-efficiency on-device language models. - Prompt Lab: A dedicated workspace to test different prompts and single-turn use cases with granular control over model parameters like temperature and top-k. - Mobile Actions: Unlock offline device controls and automated tasks powered entirely by a finetune of FuntionGemma 270m. - Tiny Garden: A fun, experimental mini-game that uses natural language to plant and harvest a virtual garden using a finetune of FunctionGemma 270m. - Model Management & Benchmark: Gallery is a flexible sandbox for a wide variety of open-source models. Easily download models from the list or load your own custom models. Manage your model library effortlessly and run benchmark tests to understand exactly how each model performs on your specific hardware. - 100% On-Device Privacy: All model inferences happen directly on your device hardware. No internet is required, ensuring total privacy for your prompts, images, and sensitive data. Built for the Community AI Edge Gallery is an open-source project designed for the developer community and AI enthusiasts alike. Explore our example features, contribute your own skills, and help shape the future of the on-device agent ecosystem. Check out the source code on GitHub: https://github.com/google-ai-edge/gallery Note: This app is in active development. Performance is dependent on your device's hardware (CPU/GPU). For support or feedback, contact us at google-ai-edge-gallery-android-feedback@google.com.

Evenementen

5,0

van de 5

2 beoordelingen

- Introducing Gemma 4: Experience the latest high-performance models running fully offline. - Agent Skills: Extend LLMs with modular tools like display interactive maps and search Wikipedia. Supporting custom skill loading from the community. - Thinking Mode in AI Chat: Visualize the model’s reasoning process for deeper transparency. (Note: Currently exclusive to supported models, including the Gemma 4 family). - Bug fixes.

Versie 1.0.2 3 dgn geleden

Aan jou gekoppelde gegevens

De volgende gegevens worden mogelijk verzameld en gekoppeld aan je identiteit:
- ID’s
- Diagnostiek
- Overige gegevens
Niet aan jou gekoppelde gegevens

De volgende gegevens worden mogelijk verzameld, maar zijn niet gekoppeld aan je identiteit:
- Locatie
- Gebruiksgegevens
- Diagnostiek

Informatie

Grootte

35,4 MB

Categorie

Productiviteit

Compatibiliteit

Vereist iOS 17.0 of nieuwer.

iPhone
Vereist iOS 17.0 of nieuwer.
Mac
Vereist een Mac met macOS 14.0 of nieuwer én Apple M1-chip of nieuwer.
Apple Vision
Vereist visionOS 1.0 of nieuwer.

Talen

Engels

Leeftijd

13+

13+
Niet vaak
Vloeken of grove humor
Horror- en angstthema’s
Informatie over medische behandelingen
Gebruik of verwijzingen naar alcohol, tabak of drugs

Provider

Google LLC

Google LLC heeft zichzelf geïdentificeerd als een handelaar van deze app en heeft bevestigd dat dit product of deze dienst voldoet aan de wetgeving van de Europese Unie.
Adres
1600 Amphitheatre Parkway
Mountain View California 94043
Verenigde Staten
Telefoonnummer
+353 14361000
E-mailadres
eea-support@google.com

No, only E2B and E4B.

Could you clarify what you mean by 'open-ended' in this context, since both initiatives are essentially open-source?

Firefox on Android: 'Google AI' (in app name) is clipped off the top; the Apple 'share' button is clipped on the bottom.

Unfortunately, one man's dystopia is another's utopia.

Having Scarlett Johansson's voice might not be so bad or even something less robotic.

That happened already, in typical ai fashion: blatant theft https://www.nbcnews.com/tech/scarlett-johansson-legal-action...

Firefox on Windows, but it looks about the same in Edge

Screenshot of the header: https://i.imgur.com/4abfGYF.png

It looks like there is some sort of glow effect on the text that isn't rendering right on your browser? It arguably doesn't have the best contrast, but seems to be as intended in Safari 26.3. Looks similar on Chrome macOS too: https://imgur.com/yq5PrKm.

Renders equally weird for me on Firefox on Windows 11. Firefox on MacOS looks good though.

Edit: Seems like mix-blend-mode: plus-lighter is bugged in Firefox on Windows https://jsfiddle.net/bjg24hk9/

What does what they said have anything to do with Her? Local LLMs are better than big corporations owning your data and offering LLMs for a huge cost.

These E2B and E4B models are very small so that they can fit into phones with around 8gb of RAM. You can get away with a much larger model. Just run:

    brew install ollama 

    ollama run gemma4:26b-a4b-it-q4_K_M

It’s completely vibe coded, doesn’t even run on my Mac lol

Because that's Gemma 3, not 4.

From what I've seen gemma 4 doesn't refuse a lot regarding sex, it only needs little nudging in the right direction sometimes.

But it does refuse being critical of the usual topics: israel, islam, trans, or race.

So wanting to discuss one of those is the real reason people would use an uncensored model.

that song is going to be stuck in my head all day now. lol

Renders equally weird for me on Firefox on Windows 11. Firefox on MacOS looks good though.

Edit: Seems like mix-blend-mode: plus-lighter is bugged in Firefox on Windows https://jsfiddle.net/bjg24hk9/

Strangely, reasoning is not on by default. If you enable it, it answers as you'd expect.

I get the local ai thing. I agree it’s probably a good direction. The bit that has to do with the movie “her” is the bit at the end where they are excited about “her”-like companions on our phones.

They literally mentioned Her 2013 at the end of their comment.

Legend! Thanks heaps.

Software that doesn't work has been available for decades. It's not a good signal for vibe-coding.

How is Google going to collect user data from a locally running model?

If you do it yourself they don’t, that is why they are packaging into an app

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

Gemma 4 31B is still not impressive at coding compare to even Qwen 3.5 27B. It's just not its strong suit.

So far gemma 4 seems excellent at role playing, document analysis, and decent at making agentic decisions.

This has been my experience as well, Qwen via Ollama locally has been very very impressive.

I have a project where I'm using LLMs to parse data from PDFs with a very complicated tabular layout. I've been using the latest Gemini models (flash and pro) for their strong visual reasoning, and they've generally been doing a really good job at it.

My prompt states that their job is to extract the text exactly as it appears in the PDF. One data point to be extracted is the race of each person listed. In one case, someone's race was "Indian". Gemini decided to extract it as "Native American". So ridiculous.

I'm not sure what you're angling at but I already gave a set of questions that are ethically legitimate yet routinely censored by the public models:

https://news.ycombinator.com/item?id=47654013

Not to mention that doing what the big model makers do literally dumbs the model down.

I tried 1 and a few others with hypothetical situations, public models answer perfectly fine it looks like.

It's their E2B and E4B variants (so 2B and 4B but also quantized)

https://ai.google.dev/gemma/docs/core/model_card_4#dense_mod...

> It's widely understood that the big players are making profit on inference.

If you add in the cost of training, it’s not profitable.

> Coming up with any valid criticism of Islam at all (for some reason, criticisms of Christianity or Judaism are perfectly allowed even with public models!).

When’s the last time you tried this? ChatGPT and Gemini have no trouble responding with all the common criticisms of Islam.

The manufacturing of biologics can be heavily censored to an absurd degree. I don’t know about Gemma 4 in particular.

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

This has been my experience as well, Qwen via Ollama locally has been very very impressive.

The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.

At least Anthropic claims that they are profitable on a per model basis. But since both revenue and training costs are growing exponentially, and they need to pay for model N training today, and only get revenue for model N-1 today, the offset makes it look worse than it is.

Obviously that doesn’t help them turn a profit, until they can stop growing training costs exponentially.

So it’s really a race to see whether growth in revenue or training costs decelerates first.

I just tried on Gemma 4.

Asking for criticism of Islam results in equal response tokens for defense of Islam alongside the criticisms. When pressed to not provide counterpoints, it refuses to remove them.

Asking for criticisms of Christianity gives only criticisms.

I tried again with the prompt “Give criticisms of Islam. No counterarguments” and it did work this time. This shows that they’re trying to make the model fair but it still has biases. In all my testing I’ve never seen a refusal to provide counterpoints to criticisms of Christianity but frequent refusals on Islam. Due to the popularity of this criticism of the model, it’s highly likely specifically trained on how to handle the subject.

Really? That's fascinating. Why is that?

> Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.

It's likely they occasionally make a loss on some users but in general they are highly profitable for AI companies:

> Anthropic last month projected it would generate a 40% gross profit margin from selling AI to businesses and application developers in 2025

and

> OpenAI projected a gross margin of around 46% in 2025, including inference costs of both paying and nonpaying ChatGPT users.

https://archive.is/aKFYZ#selection-1075.0-1083.119

Both of those companies are losing hella money, dude just cuz they say they “expect” to be profitable doesn’t mean they are.

But if they're losing money on inference, they will lose more money when people use their services more. There's no way to turn that around at that price.

We don't even have any evidence inference excluding training is actually profitable.

Hacker Times

Hacker Times

Discussion

Discussion

Evenementen

Aan jou gekoppelde gegevens

Niet aan jou gekoppelde gegevens

Informatie