Social media can be intellectually stimulating and educational, but it's also easy to get sucked into ideological sniping and flamewars, even if you didn't go looking for it. The emotional and intellectual energy spent flaming strangers on the Internet is a complete waste of human capital.
With an API like this, I assume you could have a browser extension that could de-snarkify content before showing it to you. You could ask the LLM to preserve all factual content from the post, but to de-claw any aggressive or snarky language. If you really wanted to have fun, you could ask it to turn anything written in an aggressive tone into something that sounds absurd or incompetent, so that the more aggressive the post, the more it would make the author look silly.
This could have a double benefit. For the reader, it insulates them from the personal attacks of random strangers on the Internet. Don't get me wrong, there is a time and a place for real, charged arguments about important issues that affect us all. But there is little to be gained from having those fights with strangers; on the contrary, I think it poisons the body politic when strangers are screaming at each other.
For the writer, it takes away any incentive to be snarky or rude. If other people filter their content this way, there's no point in trying to be mean to them, and no "race to the bottom" for who can be more nasty.
But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back. That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.
It would actually be pretty interesting to see if its possible to decentralize the compute to generate something useful from a larger prompt broken down and sent to a bunch of browsers using a subagent pattern or something like RLM, each working on a smaller part of the prompt
While many AI integrations are focused on text communication / chat style. A lot of software benefits from non-text interfaces.
I believe at some point OSes and browsers should provide an API to manage models so you'll have access to on-device/remote ones with a simplified interface for the app. Making something standardized that is cross-platform would be fantastic. It also needs to be on mobile devices, so the players that can easily make it happen are mostly Apple and Google. (Meta will follow or vice-versa I guess)
Key-point: it shouldn't be exclusive to promoted models.
(1) https://developer.apple.com/documentation/foundationmodels So the app would be able to query and get the right model(s).
It's a tiny script that looks up the rss feed and uses the content to generate summaries; quite a nice fit with our static site. Sometime I'd like to extend it to ask different questions about the content.
Also what is toxic to one person is not toxic to another depending on their subjective choices. How will you solve for this without everyone just seeing what they want to see even if reality is not like that? I feel that will just enhance the problems of social media than reduce it.
It kind of falls apart when you start to think of edge cases rather than "hey this tool will keep morons off my feed!" mentality
If you want to do anything interesting you need transformers.js and a decent mode. Qwen 0.9B is where things start working usefully
I haven't pushed out a full version[1] which uses ducklake-wasm + this to make a completely local SQL answering machine, but for now all it does is retype prompts in the browser.
I'm not particularly happy about that outcome as I wish we had more locally run AI models for reasons of privacy and efficiency, so this is more just a warning that at present there are some severe tradeoffs.
1 - https://sendcheckit.com/blog/ai-powered-subject-line-alterna...
Or a LocalNet API that integrates with trusted hardware devices on your local network. As a trial (Chrome beta programme — strictly limited but here’s 3x signup links to share with your friends) you can adjust your Google Next Mini underfloor heating directly from Chrome!
Or a DirectCast API that lets you stream <video> elements to a device of your choice even over a VPN. As a Chrome trial, you can use your Google Cloud account to stream directly from YouTube Premium to any linked Google Chromecast devices you own!
- Gemini Nano-1: 46% MMLU, 1.8B
- Gemini Nano-2: 56% MMLU, 3.25B
- Gemma4 E2B: 60.0% MMLU, 2.3B
- Gemma4 E4B: 69.4% MMLU, 4.5B
Sources:
- https://huggingface.co/google/gemma-4-E2B-it
- https://android-developers.googleblog.com/2024/10/gemini-nan...
But... It's the type of idea that is unpredictable as it comes into contact with reality. If it works, it probably works very differently from the initial idea of how it will work.
My wish list:
- Eliminate ALL clickbait titles and ads. I only want to see a dry factual title.
- For any given topic, I only care about the main article (with the option to only see a summary, unless its a high quality blog) and couple of substantive comments, rest is junk I don't want to see.
The current state of popular social media sites has meant that I don't use it at all (except HN, which is trending in the same direction due to saturation with AI), but every other week or so I end up wasting a few hours, which I'd like to avoid entirely.
Ideally this would lead to 98% of content filtered/summarised out, and over time only use the internet for looking things up with intention. I want this to remove majority of "entertainment" value from the internet (by default) so that time/energy can be refocused in real life and high quality sources (books) only.
Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.
With MoE models, you could fetch expert layers from the network on demand by issuing HTTP range queries for the corresponding offset, similar to how bittorrent downloads file chunks from multiple hosts. You'd still have to download shared layers, but time to first token would now be proportional to active-size rather than total-size. Of course this wouldn't be totally "offline" inference anymore, but for a web browser feature that's not a key consideration.
Here's to hoping that that dystopia will never happen.
fantastic!
> the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back
sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?
that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?
also, any idea what the on disk impact is?
Plus even if you really wanted to do that, WebGPU exists and has for a while right?
And do you guys communicate between other browsers when doing something like this to try to settle on something common? I don't mean W3C but practically, it's a small world after all.
I want the option to engage with the substance of new developments in the world, technology, etc. without the drama. I don't want to be drawn into the drama of strangers (who could, for all I know, just be bots or ragebaiting AIs).
If I want drama, there's plenty of it on TV, or I could talk to my friends about what is going on with people I actually know.
The anti-pattern, in my mind, is logging on to engage with substantive content and to be inadvertently drawn into flamewars with strangers.
So it's once per browser, not once per site.
You can track the download state yourself and pop whatever UI you want.
Sure, you might say this sort of thing is boiling flavor out of your food, but... boiling the bacteria out of what you consume isn't a bad thing.
This is a common misconception, probably due to the unfortunate naming. Expert layers are not "expert" at any particular subject, and active-size only refers to the activated layers per token. You'd still need all (or most of all) the layers for any particular query, even if some layers have a very low chance of being activated.
All in all, you'd be better off with lazy loading the entire model, at least you'd know you have the capability to run inference from then on.
Edit: simple example is a spam bot
> This feels like a lot of work for low reward
Low per-device reward combined with a high user count - either by large legitimate players or by botnets - has been the monetisation strategy of most online enterprises.If it turns out useful enough I'm sure browsers will just start including it as (perhaps optional?) part of installation.
The target usage for the prompt API is anything that would benefit from the general capabilities of a language model, and can't be encompassed by the more-specific APIs for summarization/writing/rewriting. Realistic use cases currently are things like sentiment analysis, keyword extraction, etc. I have a number of ideas on how to integrate it into my current retirement project around Japanese flashcards, e.g. generating example sentences. If the small (~10 GiB) model class keeps getting smarter, the class of things possible on-device in this way gets larger and larger over time.
We definitely communicated with other browsers. There were the standing WebML Community Group meetings at the W3C every few weeks. There were async discussions like https://github.com/mozilla/standards-positions/issues/1213 and https://github.com/WebKit/standards-positions/issues/495 . (Side note, I love the contrast between Mozilla's helpful in-depth feedback and WebKit's... less helpful feedback.) There was also a bit of a debacle where the W3C Technical Architecture Group tried to give "feedback" but the feedback ended up being AI-generated slop... https://github.com/w3ctag/design-reviews/issues/1093 .
But overall, yeah, the goal with the prompt API, as with all web APIs, is to put something out there for discussion as early as possible, and get input from the broad community, especially including other browsers, to see if it's something that they are interested in collaborating on. https://www.chromium.org/blink/guidelines/web-platform-chang... (which I also wrote) goes into how the Chromium project thinks about such collaboration in general.
I can manually “hold” emails so they don’t go in the “sort out my email” woodchipper. It’s been life-changing.
Note that the article here was last updated 2025-09-21, and as of that time it was already on Gemini Nano 3.
There's a lot of ways this API could go, e.g. more powerful models eventually, or perhaps integration with cloud models. For example, I could see Google trying to default Gemini as the model for users signed into Chrome
I see the merit in such a proposal. It's the linguistic equivalent to boiling the food you consume, instead of eating it raw with all the associated bad stuff.
The problem is, as you said, that this plan is unlikely to be as rosy as it's portrayed and probably has a lot of drawbacks in real life.
Interesting to think about and explore, though.
> Built-in models should be significantly smaller. The exact size may vary slightly with updates.Yes, I can read and comprehend English and you should assume I read the page. Because of the "At least" wording, I was curious what a person who has actually used the feature has noticed, aka, learning from people who have actually done it already.
As for cloud models, that would be interesting, although I guess then the fraud would be easier in spoofing whatever parameters (ip address? domain name? some Chrome install identifier?) to get around whatever rate limiting they come up with, rather than actually using people’s computers.
Anyways I’m sure if it ends up being abused, they can throw a permissions dialog in front of it. Just need to figure out a way to make normal people understand.
I mean... you would be basically taking a complex thing, transforming and reconstructing it. What we want out of social media isn't a simple, legible function. The positives. You'd have to discover them.
If someone starts building with the intitial idea above, my guess is that they'd end up with some sort of custom feed that draws inspiration and inputs from social media... but isn't social media. It's something else that you can scroll, read and whatnot.
Then it's possible the model you get will scale with the CPU/GPU/RAM available, so if you have a 12GB GPU you probably get a better model, perhaps that's a 10-11GB model? At 2x that's 22GB.
Then consider that a machine is not static, GPUs/hardware come and go, VRAM allocation in integrated graphics changes, etc. You end up with just needing to pick a number and not confuse users.
> Just need to figure out a way to make normal people understand.
Has that strategy ever actually worked?I want to go to news.ycombinator.com/reddit.com/etc on any given day and just see a couple of paragraphs and maybe a few reference links to follow if I so choose. Spend a few minutes reading that and close it.
All of that in the hope of diverting my limited time/energy on Earth to endeavours in real life with real people.
I would rather pay money than seeing this thing running in my browser that only prints 5 tps on high-end consumer hardware.
This is part of it, and also we just didn't want to use up the last of the user's disk space! It's disrespectful to use up 3 GB if the user only has 4 GB left; it's sketchy if the user only has 10 GB. At 22 GB, we felt there was more room to breathe.
One could argue that users should have more agency and transparency into these decisions, and for power users I agree... some kind of neato model management UI in chrome://settings would have been cool. But 99% of users would never see that, so I don't think it ever got built.


Published: May 20, 2025, Last updated: September 21, 2025
| Explainer | Web | Extensions | Chrome Status | Intent |
|---|---|---|---|---|
| GitHub | View | Intent to Experiment | ||
| GitHub | View | Intent to Experiment |
With the Prompt API, you can send natural language requests to Gemini Nano in the browser.
There are many ways you can use the Prompt API. For example, you could build:
These are just a few possibilities, and we're excited to see what you create.
The following requirements exist for developers and the users who operate features using these APIs in Chrome. Other browsers may have different operating requirements.
The Language Detector and Translator APIs work in Chrome on desktop. These APIs do not work on mobile devices.
The Prompt API, Summarizer API, Writer API, Rewriter API, and Proofreader API work in Chrome when the following conditions are met:
Gemini Nano's exact size may vary as the browser updates the model. To determine the current size, visit chrome://on-device-internals.
The Prompt API uses the Gemini Nano model in Chrome. While the API is built into Chrome, the model is downloaded separately the first time an origin uses the API. Before you use this API, acknowledge Google's Generative AI Prohibited Uses Policy.
To determine if the model is ready to use, call LanguageModel.availability().
const availability = await LanguageModel.availability({
// The same options in prompt() or promptStreaming()
});
To trigger the download and instantiate the language model, check for user activation. Then, call the create() function.
const session = await LanguageModel.create({
monitor(m) {
m.addEventListener('downloadprogress', (e) => {
console.log(Downloaded ${e.loaded * 100}%);
});
},
});
If the response to availability() was downloading, listen for download progress and inform the user, as the download may take time.
All of the built-in AI APIs are available on localhost in Chrome. Set the following flags to Enabled:
chrome://flags/#optimization-guide-on-device-modelchrome://flags/#prompt-api-for-gemini-nano-multimodal-inputThen click Relaunch or restart Chrome. If you encounter errors, troubleshoot localhost.
The params() function informs you of the language model's parameters. The object has the following fields:
defaultTopK: The default top-K value.maxTopK: The maximum top-K value.defaultTemperature: The default temperature.maxTemperature: The maximum temperature.// Only available when using the Prompt API for Chrome Extensions. await LanguageModel.params(); // {defaultTopK: 3, maxTopK: 128, defaultTemperature: 1, maxTemperature: 2}
Once the Prompt API can run, you create a session with the create() function.
const session = await LanguageModel.create();
When you use the Prompt API for Chrome Extensions, each session can be customized with topK and temperature using an optional options object. The default values for these parameters are returned from LanguageModel.params().
// Only available when using the Prompt API for Chrome Extensions.
const params = await LanguageModel.params();
// Initializing a new session must either specify both topK and
// temperature or neither of them.
// Only available when using the Prompt API for Chrome Extensions.
const slightlyHighTemperatureSession = await LanguageModel.create({
temperature: Math.max(params.defaultTemperature * 1.2, 2.0),
topK: params.defaultTopK,
});
The create() function's optional options object also takes a signal field, which lets you pass an AbortSignal to destroy the session.
const controller = new AbortController(); stopButton.onclick = () => controller.abort();
const session = await LanguageModel.create({
signal: controller.signal,
});
With initial prompts, you can provide the language model with context about previous interactions, for example, to allow the user to resume a stored session after a browser restart.
const session = await LanguageModel.create({ initialPrompts: [ { role: 'system', content: 'You are a helpful and friendly assistant.' }, { role: 'user', content: 'What is the capital of Italy?' }, { role: 'assistant', content: 'The capital of Italy is Rome.' }, { role: 'user', content: 'What language is spoken there?' }, { role: 'assistant', content: 'The official language of Italy is Italian. [...]', }, ], });
You can add an "assistant" role, in addition to previous roles, to elaborate on the model's previous responses. For example:
const followup = await session.prompt([ { role: "user", content: "I'm nervous about my presentation tomorrow" }, { role: "assistant", content: "Presentations are tough!" } ]);
In some cases, instead of requesting a new response, you may want to prefill part of the "assistant"-role response message. This can be helpful to guide the language model to use a specific response format. To do this, add prefix: true to the trailing "assistant"-role message. For example:
const characterSheet = await session.prompt([ { role: 'user', content: 'Create a TOML character sheet for a gnome barbarian', }, { role: 'assistant', content: '```toml\n', prefix: true, }, ]);
The Prompt API has multimodal capabilities and supports multiple languages. Set the expectedInputs and expectedOutputs modalities and languages when creating your session.
type: Modality expected.expectedInputs, this can be text, image, or audio.expectedOutputs, the Prompt API allows text only.languages: Array to set the language or languages expected. The Prompt API accepts "en", "ja", and "es". Support for additional languages is in development.expectedInputs, set the system prompt language and one or more expected user prompt languages.expectedOutputs languages.const session = await LanguageModel.create({ expectedInputs: [ { type: "text", languages: ["en" /* system prompt /, "ja" / user prompt */] } ], expectedOutputs: [ { type: "text", languages: ["ja"] } ] });
You may receive a "NotSupportedError" DOMException if the model encounters an unsupported input or output.
With these capabilities, you could:
Take a look at the Mediarecorder Audio Prompt demo for using the Prompt API with audio input and the Canvas Image Prompt demo for using the Prompt API with image input.
The Prompt API supports the following input types:
HTMLImageElementSVGImageElementHTMLVideoElement (uses the video frame at the current video position)HTMLCanvasElementImageBitmapOffscreenCanvasVideoFrameBlobImageDataThis snippet shows a multimodal session that first processes two visuals (one image Blob and one HTMLCanvasElement) and has the AI compare them, and that second lets the user respond with an audio recording (as an AudioBuffer).
const session = await LanguageModel.create({ expectedInputs: [ { type: "text", languages: ["en"] }, { type: "audio" }, { type: "image" }, ], expectedOutputs: [{ type: "text", languages: ["en"] }], });
const referenceImage = await (await fetch("reference-image.jpeg")).blob();
const userDrawnImage = document.querySelector("canvas");
const response1 = await session.prompt([
{
role: "user",
content: [
{
type: "text",
value:
"Give a helpful artistic critique of how well the second image matches the first:",
},
{ type: "image", value: referenceImage },
{ type: "image", value: userDrawnImage },
],
},
]);
console.log(response1);
const audioBuffer = await captureMicrophoneInput({ seconds: 10 });
const response2 = await session.prompt([
{
role: "user",
content: [
{ type: "text", value: "My response to your critique:" },
{ type: "audio", value: audioBuffer },
],
},
]);
console.log(response2);
Inference may take some time, especially when prompting with multimodal inputs. It can be useful to send predetermined prompts in advance to populate the session, so the model can get a head start on processing.
While initialPrompts are useful at session creation, the append() method can be used in addition to the prompt() or promptStreaming() methods, to give additional additional contextual prompts after the session is created.
For example:
const session = await LanguageModel.create({ initialPrompts: [ { role: 'system', content: 'You are a skilled analyst who correlates patterns across multiple images.', }, ], expectedInputs: [{ type: 'image' }], });
fileUpload.onchange = async () => {
await session.append([
{
role: 'user',
content: [
{
type: 'text',
value: `Here's one image. Notes: ${fileNotesInput.value}`,
},
{ type: 'image', value: fileUpload.files[0] },
],
},
]);
};
analyzeButton.onclick = async (e) => {
analysisResult.textContent = await session.prompt(userQuestionInput.value);
};
The promise returned by append() fulfills once the prompt has been validated, processed, and appended to the session. The promise is rejected if the prompt cannot be appended.
Add the responseConstraint field to prompt() or promptStreaming() method to pass a JSON Schema as the value. You can then use structured output with the Prompt API.
In the following example, the JSON Schema makes sure the model responds with true or false to classify if a given message is about pottery.
const session = await LanguageModel.create();
const schema = {
"type": "boolean"
};
const post = "Mugs and ramen bowls, both a bit smaller than intended, but that
happens with reclaim. Glaze crawled the first time around, but pretty happy
with it after refiring.";
const result = await session.prompt(
`Is this post about pottery?\n\n${post}`,
{
responseConstraint: schema,
}
);
console.log(JSON.parse(result));
// true
Your implementation can include a JSON Schema or regular expression as part of the message sent to the model. This uses some of the context window. You can measure how much of the context window it will use by passing the responseConstraint option to session.measureContextUsage().
You can avoid this behavior with the omitResponseConstraintInput option. If you do so, we recommend that you include some guidance in the prompt:
const result = await session.prompt( Summarize this feedback into a rating between 0-5. Only output a JSON object { rating }, with a single property whose value is a number: The food was delicious, service was excellent, will recommend. , { responseConstraint: schema, omitResponseConstraintInput: true });
You can prompt the model with either the prompt() or the promptStreaming() functions.
If you expect a short result, you can use the prompt() function that returns the response once it's available.
// Start by checking if it's possible to create a session based on the // availability of the model, and the characteristics of the device. const available = await LanguageModel.availability({ expectedInputs: [{type: 'text', languages: ['en']}], expectedOutputs: [{type: 'text', languages: ['en']}], });
if (available !== 'unavailable') {
const session = await LanguageModel.create();
// Prompt the model and wait for the whole result to come back.
const result = await session.prompt('Write me a poem!');
console.log(result);
}
If you expect a longer response, you should use the promptStreaming() function which lets you show partial results as they come in from the model. The promptStreaming() function returns a ReadableStream.
const available = await LanguageModel.availability({ expectedInputs: [{type: 'text', languages: ['en']}], expectedOutputs: [{type: 'text', languages: ['en']}], }); if (available !== 'unavailable') { const session = await LanguageModel.create();
// Prompt the model and stream the result:
const stream = session.promptStreaming('Write me an extra-long poem!');
for await (const chunk of stream) {
console.log(chunk);
}
}
Both prompt() and promptStreaming() accept an optional second parameter with a signal field, which lets you stop running prompts.
const controller = new AbortController(); stopButton.onclick = () => controller.abort();
const result = await session.prompt('Write me a poem!', {
signal: controller.signal,
});
Each session keeps track of the context of the conversation. Previous interactions are taken into account for future interactions until the session's context window is full.
Each session has a maximum number of tokens it can process. Check your progress towards this limit with the following:
console.log(${session.contextUsage}/${session.contextWindow});
It's possible to send a prompt that causes the context window to overflow. In such cases, the initial portions of the conversation with the language model will be removed, one prompt and response pair at a time, until enough tokens are available to process the new prompt. The exception is the system prompt, which is never removed.
Such overflows can be detected by listening for the contextoverflow event on the session:
session.addEventListener("contextoverflow", () => { console.log("We've gone past the context window, and some inputs will be dropped!"); });
If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the prompt() or promptStreaming() call will fail with a QuotaExceededError exception and nothing will be removed. The QuotaExceededError has the following properties:
requested: how many tokens the input consists ofcontextWindow: how many tokens were availableLearn more about session management.
To preserve resources, you can copy an existing session with the clone() function. This creates a fork of the conversation, where the context and initial prompt are preserved.
The clone() function takes an optional options object with a signal field, which lets you pass an AbortSignal to destroy the cloned session.
const controller = new AbortController(); stopButton.onclick = () => controller.abort();
const clonedSession = await session.clone({
signal: controller.signal,
});
Call destroy() to free resources if you no longer need a session. When a session is destroyed, it can no longer be used, and any ongoing execution is aborted. You may want to keep the session around if you intend to prompt the model often since creating a session can take some time.
await session.prompt( "You are a friendly, helpful assistant specialized in clothing choices." );
session.destroy();
// The promise is rejected with an error explaining that
// the session is destroyed.
await session.prompt(
"What should I wear today? It is sunny, and I am choosing between a t-shirt
and a polo."
);
We've built multiple demos to explore the many use cases for the Prompt API. The following demos are web applications:
To test the Prompt API in Chrome Extensions, install the demo extension. The extension source code is available on GitHub.
The Prompt API for the web is still being developed. While we build this API, refer to our best practices on session management for optimal performance.
By default, the Prompt API is only available to top-level windows and to their same-origin iframes. Access to the API can be delegated to cross-origin iframes using the Permission Policy allow="" attribute:
<iframe src="https://cross-origin.example.com/" allow="language-model"></iframe>
The Prompt API isn't available in Web Workers for now, due to the complexity of establishing a responsible document for each worker in order to check the permissions policy status.
Your input can directly impact how we build and implement future versions of this API and all built-in AI APIs.