I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.
But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.
Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.
My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?
It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though
Other than that it’s pretty decent (for the price).
so better models may still be cheaper even if the price per token is higher.
Lots of US providers are hosting these “open source” models so doubt that’s the problem.
I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.
(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)
For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.
But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)
--
[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)
GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference
UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.
- GPT-5.5: 62.7%
- Opus 4.8: 62.2%
- Kimi K2.7 Code: 56.3%
- Kimi K2.6: 48.2%
Just in case there are those who'd reflexively down vote this post, I'd just like to say that in a time of great national geopolitical rivalries, this kind of question is not unreasonable one to ask. Indeed, its applicable question whichever nation you live in.
For me, the big thing with MiMo-V2.5-Pro and DeepSeek V4-Pro is that cached inputs are practically free. Kimi K2.7 Code is 53x more expensive for cached inputs which is 95% of my costs.
If I use 95M cached input tokens, 4M input tokens, and 1M output tokens, that'd be: $18 for cached input on Kimi K2.7 Code vs $0.34 with MiMo/DS; $3.80 for inputs on Kimi vs $1.74 with MiMo/DS; and $4 for output on Kimi vs $0.87 with MiMo/DS.
Of all the places where I'm accumulating costs by using Kimi, it's the cached inputs. The real savings with MiMo/DS's price cut is the cached inputs.
To be clear, the “advertising” clause just requires you to disclose that you use the thing somewhere in the product, such as credits in an “About” section.
I've had some success turning my macbook M1 pro into a heating pad with Qwen 3.6 35B A3B MTP. Trying to use Gemini models "locally" resulted in a similar "short shrift" of effort resulting in mistakes and lots of turns. The reports of Fable being relentlessly "proactive" shows you can go the other direction as well, if you have strong enough branding and effective invoicing.
They are a consultancy in Germany, but I watched a presentation on them tuning and removing bias from Deepseek models. It was quite interesting.
https://www.tngtech.com/en/about-us/news/release-of-deepseek...
(I upvoted your question as I agree)
Its not just code we need to worry about, its also subliminal messaging and other things.
The CCP is not influencing my Rust code quality that much. Though I did notice all my lifetimes are now 'static because nothing is ever allowed to leave the party's ownership, unsafe blocks require approval from a central committee.
Honestly the scariest part is that shared mutable state is forbidden unless the state is doing the sharing.
Otherwise it is pretty ok.
Once you have a coherent design (the hard part), you can feed it to a pretty small model and get basically the same quality.
They'll not one-shot, but they're faster and cheaper, so it still works out in your favor.
Plus you can do it locally...
https://platform.kimi.ai/docs/guide/kimi-k2-7-code-quickstar...
For the curious: https://news.ycombinator.com/item?id=48498573 - “Claude Fable is relentlessly proactive”.
Xiaomi MiMo ($6/mo: https://platform.xiaomimimo.com/token-plan) & Alibaba Qwen ($50/mo: https://www.alibabacloud.com/en/campaign/ai-scene-coding) have generous limits on fixed subscriptions.
Don’t you know there’s no honor among thieves?
The term seems to have the connotation of "competitive at 1/10 the price of Claude", so I don't see the problem.
It's not Harbor Freight Chinese (and heck even they have decent stuff sometimes now too).
You don't think people still talk about Japanese cars as a distinction in quality from US or European ones?
Edit: Downvoting something doesn't make it false.
Cursor had an "agreement" with Fireworks.ai, which apparently allowed them to RL Composer 2 atop Kimi Base 2.5 without attribution: https://x.com/Kimi_Moonshot/status/2035074972943831491 / https://archive.vn/CcdkI
Composer 2 performed differently on evals than Moonshot.ai's coding models: Cursor claims theirs is better than Claude Opus 4.6: https://x.com/fynnso/status/2034706304875602030 / https://archive.vn/bVtik. And, per Lee Robinson (Cursor employee), it is very likely Cursor builds its own foundational model for Composer 3.
They return instructions for you to do something, and you or a script you permit chooses to execute what the model tells you and return the result to the model.
[1] https://www.tomshardware.com/tech-industry/artificial-intell...
Only that it’s a fairly meaningless grouping. When japan first entered the car market in north america there might have been some commonality, but now what characteristics do they share that some american cars don’t have? They’re not even imported a lot of the time.
Given that, it does start to feel tinged with racism if someone insists on grouping things together that don’t really belong together.
As for Chinese LLMs, the term doesn’t “feel” pejorative to me - but i also don’t see a totally clear set of attributes they share. Not all are open-weight. Some are small and can be run on consumer hardware, some are huge. They even have a variety of answers to what happened june 3rd 1989
And even if the Chinese Communist Party provided funding, the result is still transparently released. So even if it is some kind of propaganda, I don't see what the problem is.
Is the monopolistic greed of American companies 'good', and China's greed 'bad'? I do have that question.
RL has a tendency to reinforce cheating when the cheats are easier to find than the final solution.
So when making your RL environment, you need to spend a lot of effort on finding ways the model can cheat and penalizing them.
I have a feeling you'd be slightly salty at people saying "Google and Tesla are making CIA models"
Better overall design?
We live in a time of a great geopolitical rivalry and high tensions with an emergent technology with tons of national security implications. To pretend otherwise is silly, and to fail to ask the question, dangerous.
Typically the answer is "reliability", which is a positive trait, which makes the original callout about negative connotations very odd to me.
China is a communist country with elements of capitalistic markets baked in. But the capitalistic elements are mostly a facade. Underneath, the state retains full ownership and control of all business. The CCP runs all aspects of the government (including the courts/judges), and is the single entity that decides what directions the country (and it's businesses) will move in.
The CCP, who defacto owns everything and has ultimate final say on everything, has one leader that has the ultimate final say on _everything_, Xi Jinping.
So while the waters of CCP models feel warm and free, understand it's not organically like that.
Since its development, IQT has invested in over 750 startups spanning diverse technological sectors, including:
- Artificial Intelligence
- Space Technologies
- Microelectronics and Quantum Computing
- Life Sciences
- Cybersecurity
- Hardware
- Energy
This broad portfolio has enabled IQT to address a wide array of national security challenges while supporting the growth of innovative startups…https://en.wikipedia.org/wiki/In-Q-Tel
https://www.npr.org/sections/alltechconsidered/2012/07/16/15...
In China it's all one entity with these mock facades of privatization. Trump cannot instruct Google to put picture of dogs on their homepage. If Xi wakes up and wants dogs on Alibaba's homepage, give it 30 minutes.
It's wholly ignorant or dishonest to make the comparison.
We absolutely know that we can't trust the American model not to do that - it's "by the oligarchs, for the oligarchs" - so it's not clear what the claim really is.
While I get the point you're making (it should be pretty obvious to anyone who's held a newspaper), I think it's important regardless to point out that Chinese companies AFAIK aren't worker-owned or -controlled, so you can't exactly call it communism, either. And they obviously do not have a "free market capitalism", as you just discussed.
It's simply a highly authoritarian state then, I guess?
Tim Apple and the other tech CEO constantly groveling at Trump’s feet indicates that he might be able to do that.
Just like threatening TV networks about having their licenses revoked of blocking mergers unless they fire the people making fun of him on TV (of course with slightly mixed success)
Sundar Pichai would personally be barking on a livestream on the homepage.
Trump is quite literally the one president showing that the US has zero rules or anything to hold power back from the white house, really not the example you want.
Sundar can do whatever he wants, but he has no legal obligation to do any of it.
Or that pesky CCP censorship and propaganda baked into the model, which any random guy can remove from whichever model they want as a single weekend side project with an off-the-shelf tool[1]. (Try it. It's fun. I've done it myself.)
As such, the state owns everything in both countries, the only differences are to what extent they control things.
I wouldn't even call the USA a capitalist system anymore, the economy is so heavily regulated and interfered with. It's a "managed economy", like pretty much every other nation's economy in the present day.
e.g. he had Colbert fired (and who knows what else) by threatening to block the Paramount/Skydance merger
I'm sorry, but that was a horrible example. Corporations have no obligation to donate money to the ballroom yet Google has donated millions.
Imagine living in a country where they have the obligation.
Exactly why my prime suspect would be the one country with focus on proprietary models, and the one country prone to bombing others, including with nuclear weapons.
So yes, there is geopolitical rivalry, but one side is deliberately antagonistic (not releasing anything in the open, putting arbitrary restrictions, spewing toxic rhetoric, applying sanctions, etc.) while the other side is letting everyone (including their rivals) to use what they've produced with little-no-to restrictions.
I'm under no illusion that if the situation was reversed China would most likely do the same, but as things stand you can probably guess which side I'm rooting for here (at least until the roles reverse).

Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6.
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 1T |
| Activated Parameters | 32B |
| Number of Layers (Dense layer included) | 61 |
| Number of Dense Layers | 1 |
| Attention Hidden Dimension | 7168 |
| MoE Hidden Dimension (per Expert) | 2048 |
| Number of Attention Heads | 64 |
| Number of Experts | 384 |
| Selected Experts per Token | 8 |
| Number of Shared Experts | 1 |
| Vocabulary Size | 160K |
| Context Length | 256K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
| Vision Encoder | MoonViT |
| Parameters of Vision Encoder | 400M |
| Benchmark | Kimi K2.6 | Kimi K2.7 Code | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|---|
| Coding | ||||
| Kimi Code Bench v2 | 50.9 | 62.0 | 69.0 | 67.4 |
| Program Bench | 48.3 | 53.6 | 69.1 | 63.8 |
| MLS Bench Lite | 26.7 | 35.1 | 35.5 | 42.8 |
| Agentic | ||||
| Kimi Claw 24/7 Bench | 42.9 | 46.9 | 52.8 | 50.4 |
| MCP Atlas | 69.4 | 76.0 | 79.4 | 81.3 |
| MCP Mark Verified | 72.8 | 81.1 | 92.9 | 76.4 |
Footnotes
Kimi-K2.7-Code adopts the same native int4 quantization method as Kimi-K2-Thinking.
You can access Kimi-K2.7-Code's API on https://platform.moonshot.ai and we provide OpenAI/Anthropic-compatible API for you. Currently, Kimi-K2.7-Code is recommended to run on the following inference engines:
Kimi-K2.7-Code has the same architecture as Kimi-K2.5/Kimi-K2.6, and the deployment method can be directly reused.
The version requirement for transformers is >=4.57.1, <5.0.0.
Deployment examples can be found in the Model Deployment Guide.
The usage demos below demonstrate how to call our official API. Note that Kimi-K2.7-Code forces thinking and preserve_thinking as True.
For third-party APIs deployed with vLLM or SGLang, please note that:
Chat with video content is an experimental feature and is only supported in our official API for now.
The recommended
temperaturewill be1.0for Thinking mode.The recommended
top_pis0.95.Instant mode is not supported.
This is a simple chat completion script which shows how to call K2.7-Code API in Thinking mode.
import openai
import base64
import requests
def simple_chat(client: openai.OpenAI, model_name: str):
messages = [
{'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'}
],
},
]
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=4096
)
print('====== Below is reasoning content in Thinking Mode ======')
print(f'reasoning content: {response.choices[0].message.reasoning}')
print('====== Below is response in Thinking Mode ======')
print(f'response: {response.choices[0].message.content}')
K2.7-Code supports Image and Video input.
The following example demonstrates how to call K2.7-Code API with image input:
import openai
import base64
import requests
def chat_with_image(client: openai.OpenAI, model_name: str):
url = 'https://huggingface.co/moonshotai/Kimi-K2.7-Code/resolve/main/figures/kimi-logo.png'
image_base64 = base64.b64encode(requests.get(url).content).decode()
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'Describe this image in detail.'},
{
'type': 'image_url',
'image_url': {'url': f'data:image/png;base64,{image_base64}'},
},
],
}
]
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=8192
)
print('====== Below is reasoning content in Thinking Mode ======')
print(f'reasoning content: {response.choices[0].message.reasoning}')
print('====== Below is response in Thinking Mode ======')
print(f'response: {response.choices[0].message.content}')
The following example demonstrates how to call K2.7-Code API with video input:
import openai
import base64
import requests
def chat_with_video(client: openai.OpenAI, model_name:str):
url = 'https://huggingface.co/moonshotai/Kimi-K2.7-Code/resolve/main/figures/demo_video.mp4'
video_base64 = base64.b64encode(requests.get(url).content).decode()
messages = [
{
"role": "user",
"content": [
{"type": "text","text": "Describe the video in detail."},
{
"type": "video_url",
"video_url": {"url": f"data:video/mp4;base64,{video_base64}"},
},
],
}
]
response = client.chat.completions.create(model=model_name, messages=messages)
print('====== Below is reasoning content in Thinking Mode ======')
print(f'reasoning content: {response.choices[0].message.reasoning}')
print('====== Below is response in Thinking Mode ======')
print(f'response: {response.choices[0].message.content}')
Kimi K2.7 Code forces preserve_thinking mode, which retains full reasoning content across multi-turn interactions and enhances performance in coding agent scenarios.
This feature is enabled by default and can't be disabled. The following example demonstrates how to call K2.7-Code API in preserve_thinking mode:
def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str):
messages = [
{
"role": "user",
"content": "Tell me three random numbers."
},
{
"role": "assistant",
"reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
# Some API (e.g. vLLM) may not support reasoning_content, you can try reasoning instead
"content": "473, 921, 235"
},
{
"role": "user",
"content": "What are the other two numbers you have in mind?"
}
]
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
max_tokens=4096,
)
# the assistant should mention 215 and 222 that appear in the prior reasoning content
print(f"response: {response.choices[0].message.reasoning}")
return response.choices[0].message.content
K2.7-Code shares the same design of Interleaved Thinking and Multi-Step Tool Call as K2 Thinking. For usage example, please refer to the K2 Thinking documentation.
Kimi K2.7-Code works best with Kimi Code CLI as its agent framework — give it a try at https://www.kimi.com/code.
Both the code repository and the model weights are released under the Modified MIT License.
If you have any questions, please reach out at support@moonshot.ai.