Sakana Fugu

You pay $200/month to Anthropic, $200/month to OpenAI, $200/month to Cursor, $200/month to $200/month to Google, and seeing that it didn't come to a nice round $1024/month, you pay $200/month to Sakana to coordinate it all, because why not.

While you're at it, feel free to send me $200 as well, I'll generate a crypto address ending with "AI".

I tried running this for some market research for my startup and it did a pretty nice job. It didn't necessarily find any obscure data, and it seemed to rely on older data than what I could find myself. On top of this, it had the same sycophantic tendencies as most LLMs these days (explaining why your idea is great and riffing on that), which I find to be unnecessary use of resources.

All put together, paying ~$60 to get a hit-or-miss report seems a bit excessive, but obviously as the models they use under the hood get better it becomes more and more worth it, assuming they also improve their grounding/search capabilities.

I'm a big fan of Sakana though, and have followed David Ha / @hardmaru since the world models papers (with the racing car game and the Doom clone), which were incredible at the time.

There are so many derisive comments here.

David Ha, CEO and co-founder, was one of the youngest managing director at Goldman Sachs before doing ML at Google. His ML publications were considered top-notch almost a decade ago. I had high hopes for him when he raised money and founded Sakana.

I do agree with some comments here that perhaps this particular product is not well thought out. I also agree with the criticism that David calls Sakana a frontier AI lab while making money just selling AI B2B applications to Japanese businesses. I also agree with the assessment that Sakana has abrasive and antagonistic, sometimes openly hostile, recruiting tactics. I also agree that his then-impressive publications may have lost their luster in the age of LLMs.

However, the man is clearly driven; and he and his team may have more to offer in future. I admire the man for not taking the conventional AI-research career path.

Looking at the technical report I'm a bit confused. The improvement from using their orchestrator models seems minimal (in some cases lower than just the model which I'm assuming is in the orchestrator's pool?). Maybe it's sort of acting as an additional reasoning step upfront? Sort of like how if you asked Claude to create a plan for how best to prompt itself, you would probably end up with a better result than just the base prompt.

Also, from the technical report, looks like they're training on the output of Claude Code, etc. I'm guessing this doesn't violate TOS because they're technically not a directly competing model. This brings me to what I see as the main risk with this service, which is that it seems like an easy thing for a frontier lab to make obsolete, either by models beginning to converge in terms of strengths or by improving their own harnesses to include more of this meta-reasoning.

Nice idea but expensive. It looks like they don’t add very low cost models like DeepSeek v4 flash into their mix.

After a few months of spending money on the best frontier models, now I am spending time using DeepSeek v4 flash as my workhorse, and flipping to more capable (but still very inexpensive) open models on an as-needed basis. We all make our own tool selection decisions, but for me, I feel happier and enjoy working more following the very fast response and ultra low cost path.

As a developer outside the US I think it's vital to have alternatives to OpenAI and Anthropic, but sadly this is not it. For $200/month you get < 3 hours of use per week, the API is extremely slow, and the output quality in my tests is nowhere near Fable. It's nowhere remotely near usable as a day-to-day workhorse. Very disappointing.

https://x.com/cortesi/status/2068898694238486658

How do you configure it to run with pi or claude code? I'm curious to try it (via subscription ideally)

EDIT: Found something here https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...

Fugu, eh? So there’s a nonzero chance this thing might kill me?

Beta user: they piloted OpenRouter fusion before it was seen as the viable step. Everyone's understood for months now that having different models check each other is the best path forward.

This gets you that in a nice neat package, without the underlying tinkering mechanics.

If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy.

They'll be incentivized for your success, not token-maximizing for their investors.

The team is super smart too. What's not to like?

Wishing them the best on launch.

Got myself the $20 subscription and tried it out. The 5-hour limit runs out surprisingly fast. Quality is okay but it feels slow, and even with my $20 Claude subscription on Fable, the credit usage ends up being lower. Fable usually catches issues in my Opus 4.8-generated code that I'd miss otherwise, but Fugu didn't. Makes me wonder if it's really at the Fable level. Hard to see the value here.

Imho there are two dimensions here: Firstly different LLMs and secondly the strategy in which you break down the problem in an agentic fashion (e.g. break up to separate agents with own persona and then judge evaluates across all agents). You can of course mix-up the dimensions as well and that's what I have been tinkering* with for a good few months with some success. This was all done using home-brew setup running on openrouter.

Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.

Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.

* https://github.com/monkeydust/rightmind

> Frontier-level performance without single-vendor dependency. [...] Plug collective intelligence directly into your workflows today with a single API.

Does multiple vendors run this "single API" or how is this not replacing a single-vendor dependency for another single-vendor dependency?

ngl, I thought sakana.ai was doing cooler stuff than this. that said, the release of a product like this makes sense because it follows your natural intuition when using these models. The best way to use LLMs is to have at least two in your pocket, because the models do a good job at covering each others assets and filling in obvious model-specific blindspots.

it's interesting that they're offering in the form of fixed cost subscription plans too. My impression was that the first party providers can do this because they api inference margins to the tune of 80ish percent. Anyone else orchestrating on top of these models have to pass through these costs or eat it themselves.

Is this the beginning of the Hyperion TechnoCore?

This is a joke, right?

Their research around building a domain specific model is pretty cool, it's kind of like Karpathy's autoresearch but pointed at deciding the optimal model to use at each step of the inference.

If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.

https://arxiv.org/pdf/2512.04695

So basically... openrouter?

Will Le Chat try to eat Sakana? There is Le Chaton Fat and then there is Sakana Fugu too..

This would have been much more interesting and impactful if it had relied on open source models rather than commercial models that are only availble via an API.

The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.

Seems kinda underwhelming considering they raised like $400M.

Very interesting. I wonder if its kinda functions similarly to how OpenRouter's fusion API does. Hopefully isn't too long to respond.

Isn't this what perplexity is?

Can someone explain this in layman terms? I don't understand any of it

Just letting you guys know that the model is not a moat.

AI noob question, is this like Amp? I just use Amp, I ask it to do neat stuff and it does it. I desperately need to invest in my AI skills but every day I open two new tabs and add it to "AI stuff" folder, and then go back to drowning in work to do.

And yet, as per usual...

     Not yet available in the EU/EEA while we work toward compliance with GDPR and EU-specific regulations.

I’ve also developed and open-sourced Mythos level model using fusion/synthesis on TrustedRouter

https://trustedrouter.com/blog/fusion-evals-open-source

I probably will never pay to Sakana, as they are involved in military contracts.

https://japannews.yomiuri.co.jp/politics/defense-security/20...

> Frontier-level performance without single-vendor dependency. [...] Plug collective intelligence directly into your workflows today with a single API.

Does multiple vendors run this "single API" or how is this not replacing a single-vendor dependency for another single-vendor dependency?

Their research around building a domain specific model is pretty cool, it's kind of like Karpathy's autoresearch but pointed at deciding the optimal model to use at each step of the inference.

If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.

https://arxiv.org/pdf/2512.04695

This would have been much more interesting and impactful if it had relied on open source models rather than commercial models that are only availble via an API.

The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.

Just letting you guys know that the model is not a moat.

I’ve also developed and open-sourced Mythos level model using fusion/synthesis on TrustedRouter

https://trustedrouter.com/blog/fusion-evals-open-source

While you're at it, feel free to send me $200 as well, I'll generate a crypto address ending with "AI".

TIL: I just found out that base58 disallows I (capital i), l (lowercase L), O (capital o) and 0 (zero), so I could only generate GrxoJt4eNXE2QaQ55iPSa7hhiYdzCo8ZeAuokmh2Cai.

(don't send anything, sharing only because of the base58 fun fact I didn't know)

Does it work? I’m less interested in economics than fit with an MVP.

at this point I might just try Neuralwatt and see how much request I can get with GLM5.2. I've read a lot of reviews that its very cheap to run using Neuralwatt cloud

Or use openrouter and switch to model you want to use..(i think so)

Yo dawg, I heard you like agents, so we put agents in yo agents so you can burn tokens while you burn tokens.

Pay $0 to run a local model or even a cheap DeepSeek V4 model via their API which is close to free per million tokens.

These prices are just going to get raced to $0.

Happy user here, pairing it with Composer 2.5, with Fugu Ultra as advisor and Fugur as planner. For scope/architecture it’s on par with useful Fable-style orchestration than one chat thread.

I've been shipping production on archive.tw with Fugu Ultra in /advisor on oh-my-pi.

Advisor doesn’t slow the loop if the driver stays fast. Worth it if your harness can split advisor from worker.

https://x.com/cortesi/status/2068898694238486658

u seem to be the only one who used it here - how did it compare to opus and gpt5.5? in theory it should be at least on par if not better at times right.

I'm glad eager people like you test for lazy people like me

Beta user: they piloted OpenRouter fusion before it was seen as the viable step. Everyone's understood for months now that having different models check each other is the best path forward.

This gets you that in a nice neat package, without the underlying tinkering mechanics.

If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy.

They'll be incentivized for your success, not token-maximizing for their investors.

The team is super smart too. What's not to like?

Wishing them the best on launch.

if you've used codex or claude, how do the usage limits on fugu feel compared to the pro plans on either? honestly wouldn't mind subscribing to this if it's as generous as what codex is giving me monthly, which seems unrealistic.

This is a joke, right?

Not necessarily. There were some tests last year-ish from hf that showed that simply alternating (randomly) between claude and gpt (whatever their versions were at the time) on a task produced better results than either of them individually. So during a task, the first call was sent to one, then the other and so on.

There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.

There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...

Fugu Ultra <https://console.sakana.ai/models#fugu-ultra> sounds similar to GPT-5.5 Pro or Gemini 3.1 Deep Think .

Is there any official source that could confirms if Fable (or Mythos) is parallelized test-time compute (like GPT 5.5 Pro) or sparse Mixture-of-Experts (MoE) transformer combined with a multi-agent, inference-time compute scaling architecture (Gemini 3.1 Deep Think)?

So basically... openrouter?

OpenRouter Fusion is basically ask N models + synthesizer step.

This is ask a special orchestrator they built, which is in front of a bunch of models, which model would suit the request best.

Regular Fugu seems to be just "pick the best model and route the request there"

Fugu Ultra can generate like a little mini workflow/plan instead to achieve a result

1. Ask GPT to derive the math. 2. Ask Opus to check for implementation/security issues. 3. Ask Gemini to synthesize or resolve disagreement. 4. Return final answer.

I could be wrong but seems to be that at a glance, so I think it's more dynamic than OpenRouter Fusion.

links to two papers with at least enough apparent quality and novelty to get into ICLR 2026

> So basically... openrouter

:skull:

i now really wonder how many people of the public understood my thesis defense lol

Will Le Chat try to eat Sakana? There is Le Chaton Fat and then there is Sakana Fugu too..

Seems kinda underwhelming considering they raised like $400M.

400m is the new 400k! Just look at the other company evaluations and how much they raised vs what they delivered

it's just one of their products right

Very interesting. I wonder if its kinda functions similarly to how OpenRouter's fusion API does. Hopefully isn't too long to respond.

Yea similar, possibly even more steps / slower. I put together an all open source fusion at 1/3 of price of Fable: https://trustedrouter.com/blog/open-fusion-beats-fable-5

We open sourced it all

and will be releasing a similar orchestrator next week on TrustedRouter

From a brief reading of what Fusion does: https://openrouter.ai/docs/guides/features/plugins/fusion

Looks like Fusion calls a bunch of models and then uses an LLM to synthesize the results, and pass to another model for final output.

Fugu looks like it's doing something different? Using an LLM earlier on in the flow as an orchestrator to decide which other LLMs to call. More coordinator than simply synthesizing results, and more "agentic".

It's interesting because it's all exposed behind a single OpenAI compatible endpoint (Responses API?) and so then presumably someone could use this for one of their single agents. Now you have agent-of-agents, nested in some sense. The token usage increases accordingly!

Isn't this what perplexity is?

Can someone explain this in layman terms? I don't understand any of it

Basically, if you combine a bunch of near-frontier models (like GPT 5.5, etc) you can get performance that sometimes surpasses top line models like Claude's Fable.

Sakana seems to have a separate approach using a domain specific model to perform the model routing step.

And yet, as per usual...

     Not yet available in the EU/EEA while we work toward compliance with GDPR and EU-specific regulations.

I probably will never pay to Sakana, as they are involved in military contracts.

https://japannews.yomiuri.co.jp/politics/defense-security/20...

Yeah, I was trying to parse their "defense policy" https://sakana.ai/company-info/defense-policy.html?lang=en But it seems like lot of words to say we have no policy and we'll just go along with the powers that be. Like they rely on deferring to the Pacifist constitution, which the current administration if moving mountains to try and change. And when it it you can bet they will not want to give up their defense contracts.

TIL: I just found out that base58 disallows I (capital i), l (lowercase L), O (capital o) and 0 (zero), so I could only generate GrxoJt4eNXE2QaQ55iPSa7hhiYdzCo8ZeAuokmh2Cai.

(don't send anything, sharing only because of the base58 fun fact I didn't know)

at this point I might just try Neuralwatt and see how much request I can get with GLM5.2. I've read a lot of reviews that its very cheap to run using Neuralwatt cloud

Happy user here, pairing it with Composer 2.5, with Fugu Ultra as advisor and Fugur as planner. For scope/architecture it’s on par with useful Fable-style orchestration than one chat thread.

I've been shipping production on archive.tw with Fugu Ultra in /advisor on oh-my-pi.

Advisor doesn’t slow the loop if the driver stays fast. Worth it if your harness can split advisor from worker.

There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...

Fugu Ultra <https://console.sakana.ai/models#fugu-ultra> sounds similar to GPT-5.5 Pro or Gemini 3.1 Deep Think .

OpenRouter Fusion is basically ask N models + synthesizer step.

This is ask a special orchestrator they built, which is in front of a bunch of models, which model would suit the request best.

Regular Fugu seems to be just "pick the best model and route the request there"

Fugu Ultra can generate like a little mini workflow/plan instead to achieve a result

1. Ask GPT to derive the math. 2. Ask Opus to check for implementation/security issues. 3. Ask Gemini to synthesize or resolve disagreement. 4. Return final answer.

I could be wrong but seems to be that at a glance, so I think it's more dynamic than OpenRouter Fusion.

links to two papers with at least enough apparent quality and novelty to get into ICLR 2026

> So basically... openrouter

:skull:

i now really wonder how many people of the public understood my thesis defense lol

> Le Chaton Fat

For others looking around: LCF is a meme model, it's not real. It's a joke.

400m is the new 400k! Just look at the other company evaluations and how much they raised vs what they delivered

it's just one of their products right

Yea similar, possibly even more steps / slower. I put together an all open source fusion at 1/3 of price of Fable: https://trustedrouter.com/blog/open-fusion-beats-fable-5

We open sourced it all

and will be releasing a similar orchestrator next week on TrustedRouter

From a brief reading of what Fusion does: https://openrouter.ai/docs/guides/features/plugins/fusion

Looks like Fusion calls a bunch of models and then uses an LLM to synthesize the results, and pass to another model for final output.

Is Perplexity still a daily driver for a lot of folks?

Basically, if you combine a bunch of near-frontier models (like GPT 5.5, etc) you can get performance that sometimes surpasses top line models like Claude's Fable.

Sakana seems to have a separate approach using a domain specific model to perform the model routing step.

The price to pay for claiming your laws are applicable worldwide.

Anthropic was very much willing & involved in U.S. military contracts before the falling out with the DoD. OpenAI is actively involved.

https://openai.com/index/our-agreement-with-the-department-o...

I imagine if it was Deepseek partnering with the CCP it would be different?

Or use openrouter and switch to model you want to use..(i think so)

Or TrustedRouter if you want privacy and open source

Pay $0 to run a local model or even a cheap DeepSeek V4 model via their API which is close to free per million tokens.

These prices are just going to get raced to $0.

I used to have a $20/mo ChatGPT subscription and now I spend $12 per year using Kimi models on OpenRouter, and that's with zero-data-retention-only providers (some models sometimes have free providers with scary tracking). Maybe I just don't use that many tokens, I don't fill the context with more than what's needed for a specific request, but it goes to show how these subscriptions can be an absolute ripoff. The thought of spending 200x that is insane to me

Maybe. But for now it's fascinating how $200/month has kind of become a normal tier.

It's similar to how AirPods normalised all of us having $300+ headphones. All of us would have scoffed at the idea a decade ago.

Not while the hardware required to run a local model at an acceptable speed costs way more than $200.

Guess what, the big players are hoarding all the RAM and GPUs so that other people can't afford decent hardware. It's working out beautifully for them!

Hard to say - since I used it in Beta with free credits, where the usage felt more 'Opus' than 'ChatGPT' but more efficient token wise. Switching models every time is annoying.

But their paid plans I'm not sure yet - planning to subscribe and can let you know.

Almost no chance it will be as generous as OpenAI though. They just don't have the money :-)

Hard to say - since I used it in Beta with free credits, where the usage felt more 'Opus' than 'ChatGPT' but more efficient token wise. Switching models every time is annoying.

But their paid plans I'm not sure yet - planning to subscribe and can let you know.

Almost no chance it will be as generous as OpenAI though. They just don't have the money :-)

So plain old ensemble technique in classical ML.

But it's priced the same as frontier models. Why do I not directly pay for frontier models?

I imagine if it was Deepseek partnering with the CCP it would be different?

I was just stating facts about Sakana, and that was enough to trigger you? For the same reason, I don’t use GPT either. At least for now, DeepSeek has no ties to the defense sector. And don’t talk as if the CCP were the devil. The U.S. president is the world’s biggest arms dealer, after all.

Or TrustedRouter if you want privacy and open source

You ought to realize that shilling your product in the comments doesn't exactly come across as trustworthy.

Maybe. But for now it's fascinating how $200/month has kind of become a normal tier.

It's similar to how AirPods normalised all of us having $300+ headphones. All of us would have scoffed at the idea a decade ago.

Many people here spent a lot more than $300 on headphones long before AirPods appeared.

The Sony WH-1000XM series and the Bose QC35 were the standard quality headphones years before AirPods were a thing, and both retailed at $300+.

Not while the hardware required to run a local model at an acceptable speed costs way more than $200.

Guess what, the big players are hoarding all the RAM and GPUs so that other people can't afford decent hardware. It's working out beautifully for them!

> Not while the hardware required to run a local model at an acceptable speed costs way more than $200

It's $200/month. You have to take into account energy costs and all the rest of a system, but if you break even within 1-2 years ($2400-$4800) it'd be a pretty good deal. And $4000 buys you a pretty decent system.

> Not while the hardware required to run a local model at an acceptable speed costs way more than $200

But it's priced the same as frontier models. Why do I not directly pay for frontier models?

This is a charitable read, but I think that being able to pick from a panoply of models will actually yield much better results in the long run.

The same model that has been post-trained to operate for hours as a Linux admin will be incapable of writing a heartfelt email, but with something like Fugu, you'd get both the Linux admin for driving the browser harness and the smaller writing specialist model for drafting the email itself.

u get a pool of them + sakana?

Of course DeepSeek is used by the PLA, why would you think otherwise? https://www.scmp.com/news/china/military/article/3303512/chi... There can be multiple devils, after all.

> At least for now, DeepSeek has no ties to the defense sector.

Like every company based in China they are under the control of the Chinese state, which is an armed entity known to use violence.

Many people here spent a lot more than $300 on headphones long before AirPods appeared.

Those were hobbyists, audiophiles, professionals, artists (recording, performing, etc.).

They are talking about a much larger group of people.

I had a really nice Sennheiser before that, too. But now you hop on the subway and everybody sports one.

The Sony WH-1000XM series and the Bose QC35 were the standard quality headphones years before AirPods were a thing, and both retailed at $300+.

Fugu, eh? So there’s a nonzero chance this thing might kill me?

How do you configure it to run with pi or claude code? I'm curious to try it (via subscription ideally)

EDIT: Found something here https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...

I'm a big fan of Sakana though, and have followed David Ha / @hardmaru since the world models papers (with the racing car game and the Doom clone), which were incredible at the time.

u get a pool of them + sakana?

I had a really nice Sennheiser before that, too. But now you hop on the subway and everybody sports one.

So plain old ensemble technique in classical ML.

Of course, premium headphones existed before. I have a WH-1000XM4 sitting right next to me.

But your aunt Josie didn't have one. Now Apple is selling 80 million units / year and the ~$300 price tag has become normal. Before that, most people had headphones that were 10 times cheaper.

Nice idea but expensive. It looks like they don’t add very low cost models like DeepSeek v4 flash into their mix.

We found that an all open source fusion was 1/3 the price and better than Fable

https://trustedrouter.com/blog/open-fusion-beats-fable-5

There are so many derisive comments here.

However, the man is clearly driven; and he and his team may have more to offer in future. I admire the man for not taking the conventional AI-research career path.

Indeed. The world models research many labs are now chasing was to some degree ignited by David Ha and Schmidhuber's 2018 paper.

More broadly, Sakana is pursing a refreshingly distinct research path, with their focus on evolutionary methods, biological intelligence (e.g. continuous thought machines) and open publication.

so he's the quintessential brilliant jerk, ok

Kind of shocking - a model comes out that beats mythos and offers a reasonable price and it ... gets downvoted?

Probably taking hate from both sides - OpenAI / Claude fans who are undercutting its moat. Chinese open-model fans that want it to be cheaper.

But it's a genuine accomplishment to hit those benchmarks and offer a reasonable plan?

Bizarre reaction TBH.

Those were hobbyists, audiophiles, professionals, artists (recording, performing, etc.).

They are talking about a much larger group of people.

Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.

Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.

* https://github.com/monkeydust/rightmind

Is this the beginning of the Hyperion TechnoCore?

I think OP meant noise-cancelling headphones, which were fairly ubiquitous in tech circles in open offices; before Apple launched AirPods.

This is interesting. Would you share a few ways in which you're using this in your workflow? What about if you were to start a new project and test and built it out from scratch - how do you work this approach in without bogging everything down(including the simple things) down with overanalysis?

This is a charitable read, but I think that being able to pick from a panoply of models will actually yield much better results in the long run.

Of course DeepSeek is used by the PLA, why would you think otherwise? https://www.scmp.com/news/china/military/article/3303512/chi... There can be multiple devils, after all.

I'm glad eager people like you test for lazy people like me

> Le Chaton Fat

For others looking around: LCF is a meme model, it's not real. It's a joke.

Is Perplexity still a daily driver for a lot of folks?

Yo dawg, I heard you like agents, so we put agents in yo agents so you can burn tokens while you burn tokens.

Anthropic was very much willing & involved in U.S. military contracts before the falling out with the DoD. OpenAI is actively involved.

https://openai.com/index/our-agreement-with-the-department-o...

The price to pay for claiming your laws are applicable worldwide.

Indeed. The world models research many labs are now chasing was to some degree ignited by David Ha and Schmidhuber's 2018 paper.

More broadly, Sakana is pursing a refreshingly distinct research path, with their focus on evolutionary methods, biological intelligence (e.g. continuous thought machines) and open publication.

> At least for now, DeepSeek has no ties to the defense sector.

Like every company based in China they are under the control of the Chinese state, which is an armed entity known to use violence.

Does it work? I’m less interested in economics than fit with an MVP.

u seem to be the only one who used it here - how did it compare to opus and gpt5.5? in theory it should be at least on par if not better at times right.

I only had time to use it for a couple of deep reviews of large Rust projects, and a few agentic coding tasks (implement plan X, refactor Y in fashion Z) before my quota ran out. My impression is that the reviews were quite strong - maybe Opus 4.8+ or around GPT 5.5 (for my particular use case) - but very slow. For implementation I found it weaker, it made a few mistakes that I haven't seen frontier models make in a long time.

Of course, premium headphones existed before. I have a WH-1000XM4 sitting right next to me.

But your aunt Josie didn't have one. Now Apple is selling 80 million units / year and the ~$300 price tag has become normal. Before that, most people had headphones that were 10 times cheaper.

$300 isn’t what AirPods cost though. You can get a pair of AirPods 4 for $129 on Apple.com, and I presume that is still the most popular model. If you’re paying ~$300, you are buying premium headphones.

You ought to realize that shilling your product in the comments doesn't exactly come across as trustworthy.

Kind of shocking - a model comes out that beats mythos and offers a reasonable price and it ... gets downvoted?

Probably taking hate from both sides - OpenAI / Claude fans who are undercutting its moat. Chinese open-model fans that want it to be cheaper.

But it's a genuine accomplishment to hit those benchmarks and offer a reasonable plan?

Bizarre reaction TBH.

so he's the quintessential brilliant jerk, ok

We found that an all open source fusion was 1/3 the price and better than Fable

https://trustedrouter.com/blog/open-fusion-beats-fable-5

Oh! I thought TrustedRouter was a joke/sarcasm. Very wrong placement of the comment.

Disclosing affiliation hasn’t been a legal thing for a while. It’s reputational. Knowing that firm spams is a black mark.

It’s all open source and I say that it’s mine in all the sibling comments above

The beauty of your approach: when people are not paying for an expensive subscription, they can decide to use models less and not feel like they are leaving money on the table.

Which model is that? What is it named?

Brilliant. What this actually is, is a swarm, albeit a very small one. I'm wondering if for research specifically, swarm size (on higher temp?) would outweigh model size.

At least, for the initial data gathering phase. You'd probably want a sequence of progressively larger models to filter it.

Have you guys tested it on anything other than research?

I think OP meant noise-cancelling headphones, which were fairly ubiquitous in tech circles in open offices; before Apple launched AirPods.

Airpods Inc. would be very high up SP500 as a standalone business.

The base model where I live (Central Europe) is $194. The Pro is $357. The Max is $779.

I just averaged it out.

It’s all open source and I say that it’s mine in all the sibling comments above

Oh! I thought TrustedRouter was a joke/sarcasm. Very wrong placement of the comment.

Disclosing affiliation hasn’t been a legal thing for a while. It’s reputational. Knowing that firm spams is a black mark.

The beauty of your approach: when people are not paying for an expensive subscription, they can decide to use models less and not feel like they are leaving money on the table.

Brilliant. What this actually is, is a swarm, albeit a very small one. I'm wondering if for research specifically, swarm size (on higher temp?) would outweigh model size.

At least, for the initial data gathering phase. You'd probably want a sequence of progressively larger models to filter it.

Have you guys tested it on anything other than research?

Working on bio coding and cybersecurity benchmarks now

Which model is that? What is it named?

Not OP but I believe it’s Fugu (7B). According to [0]: “Fugu itself is a trained coordinator LLM.”

[0] https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...

The base model where I live (Central Europe) is $194. The Pro is $357. The Max is $779.

I just averaged it out.

Airpods Inc. would be very high up SP500 as a standalone business.

Working on bio coding and cybersecurity benchmarks now

Not OP but I believe it’s Fugu (7B). According to [0]: “Fugu itself is a trained coordinator LLM.”

[0] https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...

One Model to Command Them All マルチエージェントを指揮する、一つのモデル

Frontier-level performance without single-vendor dependency. Fugu dynamically orchestrates the world's best models to tackle complex, multi-step tasks. Plug collective intelligence directly into your workflows today with a single API. Sakana Fugu は、世界のトップモデル群を動的にオーケストレーションし、複数ステップに及ぶ複雑なタスクを自動的に解決します。高いパフォーマンスを実現するAPIを、あなたのワークフローに組み込みましょう。

Not yet available in the EU/EEA while we work toward compliance with GDPR and EU-specific regulations. GDPR等のEU/EEA固有規制への対応を進めており、現在はEU・EEA域内ではご利用いただけません。

What is Sakana Fugu ?

A Multi-Agent System, Delivered as One Model マルチエージェントを、一つのモデルAPIとして提供

Sakana Fugu achieves superior performance by dynamically coordinating and orchestrating a diverse pool of powerful models. Instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns. Sakana Fugu は、強力で多様なモデル群を動的に組み合わせ、協調させることで高いパフォーマンスを実現します。人間が思い付かないようなモデルの編成や役割分担、処理の進め方など、効率よく学習しながら成果を発揮します。

Sakana Fugu architecture overview

One API to Access All in an Optimized Way 一つのAPIで、複数モデルを最適に活用

Access a coordinated pool of specialized models through one API. Fugu handles model selection and switching for each task, reducing API complexity while improving cost-performance. 専門特化型のモデル群を、一つのAPIから利用することができます。タスクごとのモデルの選択と切り替えは Sakana Fugu が担うため、APIまわりの煩雑さを抑えつつ、コストパフォーマンスを高められます。

Offering Superior Performance on Complex Tasks 複雑なタスクで優れたパフォーマンス

Built for coding, reasoning, and other quality-critical workflows, Fugu coordinates expert agents to tackle complex tasks with stronger, more reliable results. Sakana Fugu は、コーディングや推論（リーズニング）など、高い品質が問われるワークフローのために設計されています。専門エージェントを連携させることで、複雑なタスクにもより確かで信頼できる答えを導きます。

Providing Flexibility in Agent Selection 柔軟なエージェント選択

Control which agents can participate in Fugu’s model pool. Opt out of specific providers or models to meet data, privacy, compliance, or organizational requirements. Sakana Fugu のモデルプールに加えるエージェントを選ぶことができます。データ、プライバシー、コンプライアンス、または組織の要件を満たすために、特定のプロバイダーやモデルを除外することが可能です。

Tech Behind

Research-Driven Coordination for Multi-Agent Intelligence マルチエージェントの知能を支える、
最新研究に基づく協調技術

Sakana Fugu is grounded in two ICLR 2026 papers on learned model orchestration: TRINITY and the Conductor. Together, they show how systems can learn to assemble, route, and coordinate expert agents for each task instead of relying on hand-designed workflows. For a deeper look at the ideas behind the system, explore our technical report . Sakana Fugu は、モデルのオーケストレーションを学習で実現する2本のICLR 2026論文「TRINITY」と「Conductor」を基盤としています。これらの研究は、人手で設計したワークフローに頼るのではなく、タスクごとに専門エージェントをどう編成し、振り分け、連携させるかをシステム自身が学習できることを示しています。仕組みの詳細は、テクニカルレポートをご覧ください。

Cover image for the TRINITY research paper.

PAPER

TRINITY: An Evolved LLM Coordinator TRINITY：進化型LLMコーディネーター

Trinity uses a lightweight evolved coordinator to orchestrate multiple LLMs over several turns, assigning Thinker, Worker, or Verifier roles to adaptively delegate work across coding, math, reasoning, and knowledge tasks. TRINITY は、軽量な進化型コーディネーターが複数のLLMを複数ターンにわたって統括する仕組み。各モデルに「Thinker（思考役）」「Worker（実行役）」「Verifier（検証役）」の役割を割り当て、コーディング・数学・推論・知識といった幅広いタスクに応じて、作業を適応的に振り分ける。

Cover image for the Conductor research paper.

PAPER

Learning to Orchestrate Agents in Natural Language with the Conductor Conductor による自然言語でのエージェント統率の学習

The Conductor is trained with reinforcement learning to discover natural-language coordination strategies, designing agent communication patterns and focused prompts that help diverse LLM pools outperform individual workers on challenging reasoning benchmarks. Conductor は強化学習によって訓練され、自然言語ベースの協調戦略を自ら見つけ出す。エージェント間のやり取りの型や、要点を絞ったプロンプトを設計することで、多様なLLMの集まりが、難度の高い推論ベンチマークで単体のモデルを上回る力を発揮。

How to Use

Unlock Multi-Agent Intelligence Through An API API を通じてマルチエージェント知能を解き放つ

Sakana Fugu comes in two models — Fugu and Fugu Ultra — both available through one OpenAI-compatible API. Pick the model that fits your workload, or switch between them without changing your integration. Sakana Fugu には Fugu と Fugu Ultra の 2 つのモデルがあり、どちらも OpenAI 互換 API から利用できます。ワークロードに合うモデルを選んでも、連携を変えずに両者を切り替えてもかまいません。

Fugu Balanced performance and latency 性能とレイテンシのバランス

Fugu balances strong performance with low latency, making it the ideal default for everyday work. Drop it into tools like Codex for coding and code review, or power responsive chatbot services — all behind a single endpoint. You can also opt specific agents out of its pool to meet data, privacy, and compliance constraints. Sakana Fugu は高い性能と低レイテンシを両立し、日々の作業に最適な標準モデルです。Codex のようなツールに組み込んでコーディングやコードレビューに使ったり、応答性の高いチャットボットを動かしたり——すべてをひとつのエンドポイントで実現します。データ・プライバシー・コンプライアンスの制約に合わせて、プールから特定のエージェントを除外することもできます。

Fugu Ultra Optimized for performance 性能に最適化

Fugu Ultra coordinates a deeper pool of expert agents to maximize answer quality on hard, high-stakes problems. Early users rely on it for Kaggle competitions, paper reproduction, cybersecurity analysis, and literature and patent investigations. Fugu Ultra は、より広い専門エージェントのプールを連携させ、難易度が高く重要な問題で回答品質を最大化します。先行ユーザーは、Kaggle コンペティション、論文の再現、サイバーセキュリティ分析、文献・特許調査などに活用しています。

Quantitative Results

Quantitative Results Sakana Fugu の性能：定量評価

Our Fugu models surpass publicly accessible frontier models and are shoulder-to-shoulder with Fable 5 and Mythos Preview in various rigorous engineering, scientific, and reasoning benchmarks while delivering frontier capability without the risk of export controls. 二つのFuguモデルは、一般に利用できるフロンティアモデルを上回り、エンジニアリング・科学・推論のさまざまな難関ベンチマークでも、Fable 5やMythos Previewと肩を並べます。しかも、輸出規制のリスクを負うことなく、フロンティアレベルの実力を発揮します。

Benchmark comparison chart

Performance comparison of Fugu models and baseline frontier models across a suite of coding, reasoning, scientific, and agentic benchmarks. For Fable 5 and Mythos Preview, we report the max of the two if both scores are available on the same benchmark. Neither of them is in Fugu’s agent pool as they are not publicly accessible. コーディング、リーズニング、科学、エージェント能力に関するベンチマーク群における、Fuguモデルとベースラインのフロンティアモデルの性能比較。Fable 5とMythos Previewについては、同一ベンチマークで両方のスコアが入手できる場合、その高い方を採用。なお、両モデルは一般提供されていないため、Fuguのエージェントプールには含まれていない。

Highest scores are shown in boldface; second-highest scores are underlined. 最高スコアは太字、2 番目に高いスコアは下線で示しています。

Benchmark	Fugu	Fugu Ultra	Opus 4.8 †	Gemini 3.1 Pro †	GPT 5.5 †
SWE Bench Pro *	59.0	73.7	69.2	54.2	58.6
TerminalBench 2.1	80.2	82.1	74.6	70.3	78.2
LiveCodeBench	92.9	93.2	87.8	88.5	85.3
LiveCodeBench Pro	87.8	90.8	84.8	82.9	88.4
Humanity’s Last Exam	47.2	50.0	49.8	44.4	41.4
CharXiv Reasoning	85.1	86.6	84.2	83.3	84.1
GPQA-D	95.5	95.5	92.0	94.3	93.6
SciCode	60.1	58.7	53.5	58.9	56.1
τ³ Banking	21.7	20.6	20.6	8.4	20.6
Long Context Reasoning	74.7	73.3	67.7	72.7	74.3
MRCRv2	86.6	93.6	87.9	84.9	94.8

* We use the mini-swe-agent as the scaffolding for this task. * mini-swe-agent をスキャフォールドとして使用。

† We use model provider-reported scores for the baselines. † モデル提供元が公表したスコア。

Qualitative Results

Qualitative Results Sakana Fugu の性能：定性的な例

These examples compare Sakana Fugu models with three frontier baselines — Gemini 3.1 Pro (high) , Opus 4.8 (max) , and GPT 5.5 (xhigh) . To keep the focus on behavior rather than brand-by-brand attribution, the baselines are anonymized as Model A , Model B , and Model C in each description. The mapping is intentionally not fixed across examples. 以下の例では、 Sakana Fugu を、 Gemini 3.1 Pro（high） 、 Opus 4.8 （max） 、 GPT 5.5（xhigh） の3つのフロンティアモデルと比較しています。個別モデルではなく挙動の違いに注目できるよう、ベースラインを Model A 、 Model B 、 Model C として匿名化しています。 なお、どのモデルがA〜Cかは例ごとに変えています。

This experiment shows an AI agent autonomously improving a small GPT's training recipe. Using AutoResearch (Karpathy et al.) – which iteratively edits training code, runs experiments, and keeps only changes that lower validation bits-per-byte (BPB) – the agent ran 123 experiments over ~14 hours on a single H100 GPU. Each line traces a system's best BPB as experiments accumulate: Fugu-Ultra is in bold red (solid = mean over three seeds, dashed = best single run), with three frontier-model baselines (Model A, B, and C) faded behind it, and the callouts mark each new improvement the agent found on its own — spanning batch size, model depth, learning rates, and optimizer settings. Fugu-Ultra finishes with the best mean BPB (0.9774 ± 0.0019), ahead of Model C (0.9781), Model B (0.9793), and Model A (0.9822), and its best single run reaches 0.9748, leading every baseline. This suggests that orchestrating multiple strong models can outperform any individual frontier model on agentic ML research. 例1 — AutoResearch / LLM学習
AIエージェントに小規模なGPTの学習レシピを自律的に改善させる実験。学習コードを反復的に書き換え、実験を実行し、検証用 bits-per-byte（BPB）を下げた変更だけを残していくエージェント型フレームワーク AutoResearch（Karpathy et al.）を用い、エージェントは単一のH100 GPU上でおよそ14時間にわたり123回の実験を実施した。各線は、実験が積み重なるにつれて各システムが達成した最良のBPBの推移を表している。Fugu-Ultra は太い赤の線（実線＝3シードの平均、破線＝最良の単一実行）で示し、その背後に3つのフロンティアモデルのベースライン（Model A・B・C）を淡色で重ねている。吹き出しは、エージェントが自ら見つけた改善点をそれぞれ示しており、バッチサイズ、モデルの深さ、学習率、オプティマイザの設定など多岐にわたる。Fugu-Ultra は最終的に最良の平均BPB（0.9774 ± 0.0019）を達成し、Model C（0.9781）、Model B（0.9793）、Model A（0.9822）を上回った。最良の単一実行では 0.9748 に到達し、すべてのベースラインを上回っている。これらの結果は、複数の強力なモデルをオーケストレーションすることで、エージェント型のML研究において単体のフロンティアモデルを上回り得ることを示唆している。

This case study tests whether the reading order of classical Japanese kana letters (仮名消息) can be recovered — letters whose scattered chirashigaki ("scattered-writing") layout makes that genuinely hard even for trained readers of classical Japanese. Each model is given the character bounding boxes together with a rough set of reading-order rules, and writes code that outputs the order the characters should be read in; here it runs on a letter written in 1610 by Hōshun'in (芳春院, 1547–1617), scored by NED (a score based on normalized edit distance from an expert's ground-truth order, where 1.0 is a perfect match). Several frontier models were put through the identical pipeline, but none came close to Fugu-Ultra on this letter: Model A reached only NED 0.24 and Model B scored no better, both far below Fugu-Ultra's 0.80, while Model C produced no predictor at all. The clip shows the two extremes — each panel draws its predicted path in red over the expert's ground truth in green: Fugu-Ultra (top) traces the letter almost exactly, while Model A (bottom) jumps all over the page. (Letter held by the Keio Institute of Oriental Classics.) 例2 — 仮名消息の読み順推定
本ケーススタディは、仮名消息（古典日本語のかな書状）という歴史的資料における読み順の推定問題を対象とする。仮名消息は、文字を紙面に散らして記す「散らし書き」という形式で書かれているため、古文書を読み慣れた人でも文字の読み順を正しく判定することは難しい。そこで各モデルに対して、文字を囲む四角形（バウンディングボックス）と読み順の大まかなルールを与え、文字の読み順を推定するコードを出力させた。実験の対象には1610年に芳春院（ほうしゅんいん、1547–1617）が記した書状を選び、NED（専門家による正しい読み順との正規化編集距離にもとづくスコア。1.0が完全一致）で評価した。複数のフロンティアモデル（A-C）を同一のパイプラインに通したところ、Fugu-Ultraの結果は他のモデルを大きく引き離した。Model AはNED 0.24、Model Bもそれと大差なく、いずれもFugu-Ultraの0.80には遠く及ばない。さらにModel Cはまともなコードを一回も出力できなかった。モデルによる読み順の違いを可視化するために、専門家による正解の読み順（緑）の上に、推定した経路（赤）を描いて映像化した。Fugu-Ultra（上）が読み順をほぼ正確になぞる一方、Model A（下）は紙面全体をあちこち飛び回り、両者は大きく異なる結果を示している。図：芳春院消息（慶應義塾大学斯道文庫蔵）

In this benchmark, each of Fugu-Ultra and 3 frontier models is given a single prompt to write a Rubik's Cube solver from scratch in pure Python — no off-the-shelf solving libraries allowed — and the resulting program is run locally on a held-out set of 300 randomly scrambled cubes. Solution quality is measured by the number of moves a solution uses, where lower is better. Fugu-Ultra and the frontier Model A wrote solvers that ran and solved all 300 cubes, while Model B and Model C each shipped sophisticated-looking code that crashed on execution and returned no valid solution at all (0/300). The clip follows cube #17: from the same scramble, Fugu-Ultra's solver reaches the solved state in 19 moves while Model A needs 21 — and across all 300 cubes Fugu-Ultra averages 19.72 moves versus 19.76 for Model A, both right at the optimal frontier, with Fugu-Ultra never a move longer than Model A on any cube (7 wins, 293 ties, 0 losses). 例3 — ルービックキューブ・ソルバー
本ベンチマークでは、Fugu-Ultraと3つのフロンティアモデルそれぞれに、純粋なPythonのみでルービックキューブソルバーをゼロから実装するよう単一のプロンプトを与えた。既存のソルバーライブラリの使用は禁止とし、生成されたプログラムをランダムにスクランブルされた300個のキューブからなるホールドアウトセットに対してローカルで実行した。解法の質は手数で評価し、少ないほど良いとする。Fugu-Ultraとフロンティアの Model A は300個すべてのキューブを解くソルバーを生成したが、Model B と Model C は一見洗練されたコードを出力したものの、実行時にクラッシュし、有効な解を一つも返せなかった（0/300）。映像はキューブ#17の様子である。同一のスクランブルに対し、Fugu-Ultraのソルバーは19手で完成状態に到達したのに対し、Model A は21手を要した。300個全体の平均では、Fugu-Ultraが19.72手、Model A が19.76手と、いずれも最適解の水準にあり、Fugu-Ultraが Model A より手数が多かったケースは一度もなかった（7勝・293引き分け・0敗）。

Task: Create a mechanical iris in CAD, like a camera aperture, where multiple blades move together to open and close the central hole. For each model, we show both the generated detailed CAD itself and a simplified view that makes the structure easier to see. In the CAD generated by Fugu Ultra, the blades rotate around outer pins and clearly open and close the aperture. In contrast, the CAD generated by the other models shows problems such as gaps appearing, weak linkages, or the aperture not closing fully. 例4 — CAD メカニカルアイリス
タスク：カメラの絞り（アパーチャ）のような、複数の羽根が連動して動き中央の穴を開閉する機械式アイリスをCADで作成する。各モデルについて、生成された詳細CAD（Detailed CAD）そのものと、構造を見やすくするための簡易ビュー（Simplified view）の両方を示す。Fugu Ultraが生成したCADでは、羽根が外側のピンを軸に回転し、アパーチャを明確に開閉できている。一方、他のモデルが生成したCADでは、隙間ができてしまう、リンク機構が弱い、アパーチャを十分に閉じきれていない、といった問題が見られる。

Four blindfold chess games, back to back. Every model plays the same way — no board shown — holding the full game in memory. Fugu outplays four strong opponents: three leading frontier models and a 2100-Elo Stockfish engine, staying accurate where they drift and ending each game in checkmate. 例5 — 目隠しチェス
4局の目隠しチェスを連続して対局している。すべてのモデルは同じ条件でプレイし、盤面は一切表示されず、ゲーム全体を記憶の中に保持しながら指し手を進める。Fugu は4つの強力な相手——3つの主要なフロンティアモデルと、2100-Elo の Stockfish エンジン——を打ち負かした。相手が手を乱していく場面でも正確さを保ち、いずれの対局もチェックメイトで終えた。

This benchmark uses a single anonymized equity over one historical 50-week window and is intended to compare sequential, no-look-ahead decision-making rather than to establish generalizable trading performance. Past performance does not guarantee future results, and results may not transfer to other assets, time periods, or live markets. Each model makes online trading decisions on anonymized STOCK_X, using only current and past weekly market data: opening, high, low, and closing prices, volume, returns, moving averages, volatility, drawdown, portfolio state, and prior feedback. Starting with $10,000, the agent chooses whether to buy, hold, or sell, and what fraction of cash or shares to trade. After each action, the next week's price is revealed and the portfolio is updated, so the model must adapt from feedback rather than seeing the future. Across five runs of the identical 50-week pipeline, Fugu-Ultra grew the portfolio to $11,943.22 ± $633.86, a +19.43% mean return, while the other frontier models reached their return less than +15%. 例6 — 株式トレーディング
匿名化された単一銘柄を1つの過去50週間のウィンドウで用いるこの株式トレーディングのベンチマーク。汎用的なトレーディング性能を立証するためではなく、先読みのない逐次的な意思決定を比較することを目的としている。過去の実績は将来の結果を保証するものではなく、結果が他の資産・期間・実際の市場に当てはまるとは限らない。各モデルは、匿名化された STOCK_X に対して、現在および過去の週次マーケットデータ——始値、高値、安値、終値、出来高、リターン、移動平均、ボラティリティ、ドローダウン、ポートフォリオの状態、直前のフィードバック——のみを用いてオンラインでトレーディングの意思決定を行う。1万ドルからスタートし、エージェントは買い・保有・売りのいずれかと、現金または株式のどの割合を取引するかを選択する。各アクションの後に翌週の価格が開示され、ポートフォリオが更新されるため、モデルは未来を見るのではなくフィードバックから適応しなければならない。同一の50週間パイプラインを5回実行した結果、Fugu-Ultra はポートフォリオを 11,943.22 ± 633.86 ドルまで成長させ、平均リターンは +19.43% に達した。一方、他のフロンティアモデルのリターンはいずれも +15% 未満にとどまった。

Users' Voices

What do our users think about Sakana Fugu ? Sakana Fugu に対するユーザー評価

Software Engineer ソフトウェアエンジニア

Coding & Code Review コーディング＆コードレビュー

For code review, Fugu Ultra is significantly better than GPT-5.5. It gives comprehensive answers and finds the bugs others miss. Where other tools flag about three issues, Sakana Fugu surfaced more than twenty. It's become the model I run all my reviews through. コードレビューでは、Fugu Ultra は回答が網羅的で、他のモデルが見逃すバグまで見つけてくれました。他のツールでは3件くらいの問題しか指摘されなかったが、 Sakana Fugu は20件以上を洗い出してくれました。

Researcher (industry) 研究者（企業）

Research & Autonomy 自律的なリサーチ

I was mapping a patent landscape across ~20 papers and several patents, normally 3–4 days of work. With Fugu I had a full analysis in a few hours, including connections between papers I would never have spotted on my own. 約20本の論文と複数の特許にまたがる特許動向（パテントランドスケープ）を作成しました。普段なら3〜4日かかる作業が、 Sakana Fugu を使うと数時間で完全な分析ができ、そのなかには、自分では決して気づけなかっただろう論文同士のつながりを見つけることができました。

Executive (enterprise platform) プラットフォーム企業・役員

Orchestration オーケストレーション

Raw output quality is on par with top frontier models, but Fugu showed unusually strong persona stability across long sessions, holding its identity where other models drift. For agent products, that may matter more than raw benchmark scores. 素の出力品質はトップクラスのフロンティアモデルと同等。加えて Sakana Fugu は、長時間のセッションでもペルソナが安定しており、他のモデルなら崩れてしまう場面でもキャラクターを保ち続けました。エージェントにとっては、これは単純なベンチマークスコア以上に重要なことです。

Researcher 研究者

Paper Reproduction 論文の再現

From one simple request, Sakana Fugu worked autonomously for nearly four hours — reading the paper, implementing, training, evaluating, and analyzing the gaps. 一つのシンプルな指示から、 Sakana Fugu はおよそ4時間続けて自律的に作業しました。論文を読み込み、実装・学習・評価まで行い、足りない点を分析してくれました。

Security Engineer セキュリティエンジニア

Security Assessment セキュリティ評価

Given one scoped instruction, Sakana Fugu drove a full security assessment end-to-end — recon, XSS/SQLi checks, auth review, and a clean report with evidence and retest steps — staying inside scope and avoiding destructive actions. 範囲を絞った指示を一つ渡しただけで、 Sakana Fugu は情報収集から XSS/SQLi の検査、認証まわりのレビュー、さらに証拠と再テスト手順を備えた整然としたレポート作成まで、セキュリティ評価を一気通貫でこなしました。しかも指定した範囲を逸脱せず、システムを壊すような操作も避けてくれました。

Pricing

Pricing Plan 料金プラン

01 Pay-as-you-go トークンプラン

Enterprise エンタープライズ

For heavy, production workloads needing maximum reliability — consumption-based tokens are served at higher priority than monthly-plan tokens. 最大限の信頼性が求められる高負荷・本番ワークロード向け。従量課金のトークンは、月額プランのトークンより高い優先度で処理されます。

Fugu

When 1 agent is active エージェントが 1 つの場合

You pay only the standard rate for that specific underlying model. その基盤モデルの標準レートのみをお支払いいただきます。

When multiple agents are active 複数のエージェントが稼働している場合

We never stack model fees; you are charged a single rate based on the top tier model involved. モデル料金を積み上げることはありません。関与する最上位モデルに基づく単一のレートで課金されます。

Fugu Ultra

Fixed pricing for fugu-ultra-20260615 fugu-ultra-20260615 の料金（一律）

Input 入力

$10 when context > 272K $10（コンテキスト272K超）

Output 出力

$30

$45 when context > 272K $45（コンテキスト272K超）

Cached input キャッシュ入力

$0.50

$1.00 when context > 272K $1.00（コンテキスト272K超）

02 Subscription Plan サブスクリプションプラン

Monthly 月額

Best for individuals and everyday hands-on use. Every tier includes both Fugu and Fugu Ultra — upgrade when you need longer, heavier, or more frequent sessions. 個人ユーザーや日常的なご利用に最適。すべてのプランで Fugu と Fugu Ultra の両方をご利用いただけます。より長時間・高負荷・高頻度の作業が必要な場合は上位プランへ。

Subscribe before the end of July 2026 to get a free second month at your initial subscription tier. 2026 年 7 月末 までにご登録いただくと、 ご加入いただいたプラン の 2 か月目を無料 でご提供します。

Standard

$20 /month /月

Lightweight daily usage 軽量な日常利用に

For occasional API calls, small experiments, and trying Fugu in personal workflows. 低頻度の API 利用、小規模な実験、個人ワークフローでの試用に。

Baseline allowance 標準の利用枠

Pro

$100 /month /月

Focused working sessions 集中した作業セッションに

For regular coding, review, research, and analysis sessions throughout the week. 普段のコーディング、レビュー、調査、分析セッションに。

10× Standard usage Standard の 10 倍の利用枠

Max

$200 /month /月

Heavy long-running workloads 長時間の高負荷ワークロードに

For power users who keep Fugu active across deeper, longer-running tasks. より深く長時間のタスクで Sakana Fugu を継続的に使うパワーユーザー向け。

20× Standard usage Standard の 20 倍の利用枠

Start Using Sakana Fugu 今すぐはじめる

FAQ

Sakana Fugu is available through an OpenAI-compatible API. Point your existing client or coding harness at the Fugu endpoint with your API key and start sending requests — no SDK migration required. Sakana Fugu は OpenAI 互換 API を通じて利用できます。既存のクライアントやコーディングハーネスを、API キーとともに Fugu のエンドポイントに向けてリクエストを送るだけです——SDK の移行は必要ありません。

Fugu balances latency and quality, making it a strong default for everyday coding and interactive work. Fugu Ultra prioritizes answer quality on complex, multi-step reasoning, coordinating more expert agents when accuracy and depth matter most, at the cost of response time. Early users reach for Fugu Ultra on demanding tasks like paper reproduction, Kaggle competitions, and paper or patent research. Fugu はレイテンシと品質のバランスが取れており、日常的なコーディングやインタラクティブな作業に適した標準モデルです。Fugu Ultra は、複雑で多段階の推論において回答品質を最優先し、精度と深さが重要な場面ではより多くの専門エージェントを連携させます（その分、応答時間は長くなります）。先行ユーザーは、論文の再現、Kaggle コンペティション、論文・特許調査などの難しいタスクで Fugu Ultra を活用しています。

Fugu Ultra relies on the full agent pool to deliver its performance, so its pool is fixed. For Fugu, you can opt out of specific models from the settings menu on our console page to match your data, privacy, and compliance needs. Fugu Ultra はその性能を発揮するために全エージェントプールを利用するため、プールは固定です。Fugu については、コンソールページの設定メニューから特定のモデルを除外でき、データ・プライバシー・コンプライアンスの要件に合わせられます。

We aim to give users the best performance available. When a new frontier model is released publicly, we expect to spend roughly two weeks training and evaluating updated Fugu models before rolling them out. 私たちは利用可能な最高の性能をユーザーに提供することを目指しています。新しいフロンティアモデルが一般公開された場合、アップデート版の Sakana Fugu モデルのトレーニングと評価に約 2 週間をかけ、その後順次提供開始していく予定です。

We offer both subscription and pay-as-you-go plans, and every plan includes access to both Fugu and Fugu Ultra. The subscription plan has three monthly tiers: Standard ($20/month) is great for lightweight daily use; Pro ($100/month) provides 10× the usage of Standard, ideal for a few focused working sessions each week; and Max ($200/month) provides 20× the usage of Standard, built for heavy, long-running workloads. The pay-as-you-go plan bills by token usage instead of a monthly allowance, giving you elastic capacity for spikes and large jobs — ideal for enterprise customers. With it, Fugu is charged at the standard rate of the underlying model, and when multiple agents are active we never stack fees: you pay a single rate based on the top tier model involved. Fugu Ultra (fugu-ultra-20260615) is priced per 1M tokens at $5 input, $30 output, and $0.50 cached input, with higher rates ($10 / $45 / $1.00) for contexts above 272K tokens. サブスクリプションプランと従量課金プランの両方をご用意しており、すべてのプランで Fugu と Fugu Ultra の両方をご利用いただけます。サブスクリプションプランには月額 3 つのプランがあります。Standard（$20/月）は軽量な日常利用に最適、Pro（$100/月）は Standard の 10 倍の利用量を提供し週に数回の集中作業に向いており、Max（$200/月）は Standard の 20 倍の利用量を提供し長時間にわたる負荷の高いワークロード向けです。従量課金プランは月額の利用枠ではなくトークン使用量に応じて課金され、スパイクや大規模ジョブに対する柔軟なキャパシティを提供します——エンタープライズのお客様に最適です。このプランでは Fugu は基盤モデルの標準レートで課金され、複数のエージェントが稼働している場合でも料金を積み上げることはありません。関与する最上位モデルに基づく単一のレートでお支払いいただきます。Fugu Ultra（fugu-ultra-20260615）は 100 万トークンあたり、入力 $5、出力 $30、キャッシュ入力 $0.50 で、272K トークンを超えるコンテキストではより高いレート（$10 / $45 / $1.00）が適用されます。

Yes. Think of Fugu pricing as a single blended rate for the active agent pool, not a sum of every model used. If your pool contains only Model A, requests are billed at Model A's rate. If your pool contains Models A, B, and C, you still pay only one rate: the rate of the top tier model among A, B, and C. In other words, adding more agents does not multiply the bill; it only determines which single model rate applies to that configured pool. はい。 Sakana Fugu の料金は、使用したすべてのモデルの合計ではなく、稼働中のエージェントプールに対する単一のブレンドレートだと考えてください。プールにモデル A だけが含まれている場合、リクエストはモデル A のレートで課金されます。プールにモデル A・B・C が含まれている場合でも、お支払いは 1 つのレート——A・B・C のうち最上位モデルのレート——だけです。つまり、エージェントを増やしても請求が積算されるわけではなく、その構成済みプールに適用される単一のモデルレートが決まるだけです。

Yes. Token usage and the corresponding cost are reported per request, so you can monitor spend in real time and forecast costs before scaling up. はい。トークン使用量と対応するコストはリクエストごとに報告されるため、支出をリアルタイムで把握し、スケールアップ前にコストを予測することができます。

Usage data helps us keep improving Fugu's performance, and we're grateful when customers share it. That said, it's entirely your choice — you can opt out of training data usage at any time from our console page. 学習データ利用についてはコンソールページからいつでもオプトアウトすることができます。お客様のご判断でご共有いただける場合は Sakana Fugu の性能向上に役立てます。

No. The specific models Fugu selects and how it coordinates them are proprietary, so this routing information is not exposed by design. いいえ。 Sakana Fugu が選択する具体的なモデルやそれらをどのように連携させるかは独自技術であり、この設計情報は公開していません。

Yes, Fugu is available from outside Japan. However, we do not provide services to users in EU (European Union) or EEA (European Economic Area) member states (please refer to our Terms of Service for details). Additionally, in other regions, access may not be available due to network conditions or local regulations. はい、日本国外からもご利用いただけます。ただし、EU（欧州連合）およびEEA（欧州経済領域）加盟国へのサービス提供は行っておりません（詳細は利用規約をご確認ください）。また、それ以外の地域におきましても、通信環境や現地の各種規制等によってご利用いただけない場合がございます。

Ready to build with Sakana Fugu ? Sakana Fugu を使った開発を始めてみませんか？

Get in touch to learn more about access, plans, and enterprise deployment. アクセス方法やプラン、エンタープライズ向けの導入に関する詳細については、お問い合わせください。

Hacker Times

Hacker Times

Sakana Fugu

Discussion

Discussion

A Multi-Agent System, Delivered as One Model マルチエージェントを、一つのモデルAPIとして提供

One API to Access All in an Optimized Way 一つのAPIで、複数モデルを最適に活用

Offering Superior Performance on Complex Tasks 複雑なタスクで優れたパフォーマンス

Providing Flexibility in Agent Selection 柔軟なエージェント選択

Research-Driven Coordination for Multi-Agent Intelligence マルチエージェントの知能を支える、
最新研究に基づく協調技術

TRINITY: An Evolved LLM Coordinator TRINITY：進化型LLMコーディネーター

Learning to Orchestrate Agents in Natural Language with the Conductor Conductor による自然言語でのエージェント統率の学習

Unlock Multi-Agent Intelligence Through An API API を通じてマルチエージェント知能を解き放つ

Quantitative Results Sakana Fugu の性能：定量評価

Qualitative Results Sakana Fugu の性能：定性的な例

What do our users think about Sakana Fugu ? Sakana Fugu に対するユーザー評価

Coding & Code Review コーディング＆コードレビュー

Research & Autonomy 自律的なリサーチ

Orchestration オーケストレーション

Paper Reproduction 論文の再現

Security Assessment セキュリティ評価

Pricing Plan 料金プラン

Ready to build with Sakana Fugu ? Sakana Fugu を使った開発を始めてみませんか？

Hacker Times

Hacker Times

Sakana Fugu

Discussion

Discussion

A Multi-Agent System, Delivered as One Model マルチエージェントを、一つのモデルAPIとして提供

One API to Access All in an Optimized Way 一つのAPIで、複数モデルを最適に活用

Offering Superior Performance on Complex Tasks 複雑なタスクで優れたパフォーマンス

Providing Flexibility in Agent Selection 柔軟なエージェント選択

Research-Driven Coordination for Multi-Agent Intelligence マルチエージェントの知能を支える、最新研究に基づく協調技術

TRINITY: An Evolved LLM Coordinator TRINITY：進化型LLMコーディネーター

Learning to Orchestrate Agents in Natural Language with the Conductor Conductor による自然言語でのエージェント統率の学習

Unlock Multi-Agent Intelligence Through An API API を通じてマルチエージェント知能を解き放つ

Quantitative Results Sakana Fugu の性能：定量評価

Qualitative Results Sakana Fugu の性能：定性的な例

What do our users think about Sakana Fugu ? Sakana Fugu に対するユーザー評価

Coding & Code Review コーディング＆コードレビュー

Research & Autonomy 自律的なリサーチ

Orchestration オーケストレーション

Paper Reproduction 論文の再現

Security Assessment セキュリティ評価

Pricing Plan 料金プラン

Ready to build with Sakana Fugu ? Sakana Fugu を使った開発を始めてみませんか？

Research-Driven Coordination for Multi-Agent Intelligence マルチエージェントの知能を支える、
最新研究に基づく協調技術