It feels like juggling pipe bombs and I have a ton of empathy for the teams being pressured by the business to roll them out with no appreciation for the regulatory rat's nest that ensues.
More industry exposure to well-managed agentic experiences will create oodles of opportunities to reduce premiums for consumers and offput some inflation-driven increases in cost of coverage.
however the result (excel/spreadsheet) looks different each time you run it. Which is annoying when you run it at the end of each month.
btw: this is not surprising when you look at the low details the skills have.
"ready-to-run agent templates for the most time-consuming work in financial services: building pitchbooks, screening KYC files, and closing the books at month-end"
Ok, maybe you can squeeze a vaguely passable pitchbook out of Claude.
But screening KYC files or closing books at month-end ?
"I'll have some of what they're smoking" as the cool kids say.
No regulator or tax office on this planet is going to accept the "but Claude said it was ok" excuse.
The only people who are going to profit out of this are Anthropic, Lawyers and Governments (through increased fines).
What I predict instead is that we will have a common UI layer plugin and a "protocol" than can speak to ui elements -- this might be more composable.
As someone who has been interviewing lately, I think this is the next step after leetcode and whiteboard style interviews.
The templates being: pitch builder, meeting preparer, earnings reviewer, model builder, market researcher, valuation reviewer, general ledger reconciler, month-end closer, statement auditor, KYC (Know Your Customer) screener.
Seems pretty scattershot. Reminds me of GPT Store.
This probably killed a thousand startups in this space.
in the early internet you wouldn't see google creating their own news site or facebook building their own animal farm. what happened to platformication of everything?
I've really only seen it used for research / exploration thus far. Either for economic research slide deck or for exploring trading hypothesis
Better Call Saul when (not if) it does.
Why didn’t I think of that.
Is the plan to have an LLM do everything? And do it worse?
"Oh yeah my Claude didn't agree with the pitch from their Claude"
The goal of current tech is to make humanity a gerbil running on a Claude wheel
Why, they can sell user data to other brokers. Experts indeed! But not in insurance or finance, of course.
Less cynically, you might say that "use AI to do <obvious thing>" is not really a viable startup pitch anymore. That's not necessarily bad.
Just yesterday I told a colleague that he should by some of their vests for his company :-D
LLMs do not change the equation all that much: human's ability to imagine is the most scarce resource on the planet and LLMs will not help all that much with it.
Luckily there is still a significant market for the services.
But there's a process risk here based on their current practises. I'm hoping those practises change so that I can recommend Claude to everyone I know, but as of now, there's existential risk exposure here that's greater than Google's.
Anthropic's automated systems can and will ban you for pretty arbitrary things; and you won't get human support or Claude – even if you are an enterprise paying out of your nose. And there's 0 redressal unless you go viral on social media. Or know someone who knows someone. See: https://x.com/Whizz_ai/status/2051180043355967802 https://x.com/theo/status/2045618854932734260
And I say that as someone who likes how Anthropic has been training Claude and Opus. I just don't think they're prepared to be the trillion dollar company they've become. They are – in a very real way – suffering from success. Which is extremely inconvenient to be on the receiving end of when you're on a deadline.
Any idea how they ensure this doesnt happen? As in, how can a user verify that the model did not touch any of the numbers and that it only built pipelines for them.
what I've been telling my CFO who wants to get AI involved in things is that for a lot of accounting and finance work "Trust but verify" doesnt work because verify is often the same process as doing the work.
Build a deterministic query set and automate it for monthly or daily reporting reconcilliation.
Leave AI out of it.
My money's on that.
I’ve also had some great results with a /reflect skill that asks the agent to look at the work in the broader context of the project. But those are the only two skills I use regularly that aren’t specific to our company, codebase, or tools.
Is this a serious question?
Without the big labs with deep pockets investing to change the consumer mindset do you think a small company with no funding has any chance of even existing?
I remember when paying $1.99 for a mobile game on iOS was considered too expensive and now it seem most consumers are primed to spend more on in-app purchases every week. That mind-shift did not happen overnight.
It was not that long ago $200 for ChatGPT subscription was considered extravagant but now even wrappers can charge this price without hesitation - some of them do.
What Anthropic is doing is priming the market of which they will be potentially one of the main beneficiaries as long as they can continue existing. But I don't think anyone will go to Anthropic directly to source their financial services agent. They will go to financial service companies that use Anthropic to build the capabilities.
What's even sadder is it can work for way too long.
Nowhere near self sufficient tools though, just great to answer questions over the data that would usually take a few hours of custom scripting/excel. I wouldn't trust our stakeholders using AI directly either, being frank.
> I've really only seen it used for research / exploration thus far
Summaries and translation for sure.
Speaking with devs in the field I know that AI tools are used to summarize and extract data from... PDFs. Now, thankfully, LLMs got better at answering "How many 'r' in 'strawberry" and it looks like they're good enough for summarizing PDFs and extracting key numbers but I'd still be cautious.
And I've got a friend who's a translator specifically for financial documents: she's a contractor and getting about 1/10th of the work (and 1/10th of the pay) she used to have for now she's only tasked to verify that the translations are correct. Of course she already had lots of tools, way before he LLM era, automating some of her work but she was still billing he use of those tools. Now LLMs are doing nearly all the work and not "for her": it's happening upstream and she only gets the output of the LLMs and has to verify them. And there aren't that many errors.
https://www.bloomberg.com/professional/insights/press-announ...
https://www.lawnext.com/2025/05/ai-hallucinations-strike-aga...
It might be lower stakes, but isn't that still a juicy target for data-exfiltration attacks?
In other words, imagine if one of your direct competitors was watching everything your employee read while making spreadsheets and slideshows.
Currently we don't know the risk, so it is kind of hard to absorb.
Code review has become unbearable because before AI, developers were reviewing code as they went writing it in the first place. Granted, never perfect and why a second person reviewing code was (is?) a best practice. But effectively there was always some level of code review happening as developers wrote code.
I fear it is way more boring to review financial and medical documents completely written by AI than it is to write (and at the same time review) by yourself. And way more dangerous to ship mistakes than in most software.
Here's some of the horrible things i've seen. Frontend dashboard with PHI/PII deployed via vercel/next because AI told them how to get their site online. Login is hardcoded into the frontend so anyone with inspect can find the password.
Another "fixed" dashboard deployed the same way. This time they added firebase auth so they got sign in with Google added with only logging into our domain. Wait how would they be able to create a token for our domain? They didn't the frontend just blocks domains from calling firebase.auth but firebase doesn't care. So simply calling the function in the console lets me login with any gmail account....
They also where showing me their RBAC with firebase. Again they don't have access to our Orgnization/Directory/Groups. So i wondered how they did this.. wouldn't you guess its a hardcoded list of approved users. You can literally call firebase.auth and sign in anonymously. Again only the frontend checks the email addresses. So now that i have a firebase auth all the backend firebase function just check that you have auth'd. So i can make any request i want to the backend. The frontend simply won't show me the code.
I could go on and on about the stupidity levels I'm facing but I don't feel like crashing out.
All I can say is this tool is only useful if you already know how to correctly implement these things. Does it save me time sure but I have to call it retarded and explain why not to do things. Honestly I feel like claude is good for people who like to gamble. When it gets it right it feels great but I don't want to roll the dice 30 times to get it correct.
Sadly this sounds like par for the course when it comes to tech. Too many messages and requests for help depend on knowing someone in the right slack groups.
I feel like there’s a metaphor in there... maybe I’ll ask Claude about it.
The AI is an expert in both following and generating prompts.
I have given up on trying to get through to him how bad of an idea this is. He's unemployed and has been working on this for over a year.
No, why would they if they have the choice?
> what happened to platformication of everything?
Business happened. The web works differently from how it used to. The users are different. LLM inference and AI tools is a different core product from search and ads. That, and we have the benefit of hindsight now. Maybe a Google newsroom would've actually been a good idea in 2006 in hindsight, who knows.
Also realistically you could say the same thing about Google Maps and Street View. That probably also killed some startups. Google isn't running a charity for startups.
I think someone stated it clearly - they can't take on these kinds of businesses until they build out the risk side and the personnel, all of which is a human problem not a tech one. A lot of processes still require physical steps and backstops because it's not possible to source all the data needed to act on it in the first place. Then you have audits and reconciliations, a bunch of strict workflow rules and atomicity to reach levels of software that bigger financial institutions would accept.
My gut reaction to stuff like this is a mix of "oh shit, they could take over my company" and "they're the next script kiddy that thinks software is anywhere near a majority of the work in some software spaces".
Google News was definitely a thing (and actually still exists).
It seems the initial product footprint tries to sidestep this problem by not giving the agents control on who to lend to or which applications to approve. Even so I think it's quite an optimistic read on their end. Happy to share reports to anyone who's interested (montana@latentevals.com), especially if you work at a frontier model lab and are interested in plugging my evals into your RL systems!
I don't necessarily disagree with that but doing it through LinkedIn slop companies? Come on man you know better than that
They are also fighting for their lives because these insane valuations simply aren’t justified by being dumb pipes. Fortunately, open weights models are widely available and have crossed a threshold of usefulness that cements their place as good substitutes.
The issue with that is obviously that most of the generated value would be captured by that company in the middle, while Anthropic would stay in the cost-conscious inference market.
Yes they can? They have infinite more cash to pay off any risk. What do you need personnel for besides sign off if the AI does it right?
I'm in that space so naturally interested in what people are up to :)
For research and theses evaluations, we're observing that firms - of names we all know - are bullish and even eager to try AI products.
Regarding automated asset management and the likes, indeed there's much more apprehension.
2. I’m almost certainly talking about health insurance, made obvious by you even mentioning that. There’s a HN guideline about discussing in good faith.
3. I find it humorous you hand-wave away our inhuman healthcare system as “for a variety of reasons”.
4. I see your career is in hedge funds, defense, and big tech. Best of luck ;)
As mentioned the problems with the US healthcare system are numerous, complex, and interrelated. I don't think they have a simple solution, nor do I think they are insurance problems at their core. For example the cost of drugs in the US vs the rest of the world has very little to do with insurance.
The analysis itself; I'm doing it by hand.
Far too often people think productivity is the point. Maybe the point is developer's understanding of the product IS the product?
You're not engineering black boxes, you're engineering legible boxes.
Y Combinator is accepting applications for the Summer 2026 Batch funding cycle. Make sure they don't miss out!
Before, some idiot would pitch their stupid idea to dozens of local webdev companies and banks and get told dozens of times their idea is straight up stupid and never going to work and they are stupid.
Now these LLMs allows them to bypass all of that advice and create what they want without any input or even knowing how the tech behind it works.
We are so fucked lol
We're not talking about what is best for the consumer (ex more competition to force iterations and improvements), but what Anthropic thinks is best for Anthropic.
Will Anthropic externalize the risk, selling access to agents? Or will internalize the risk and liability, selling financial services? Maybe both? I guess lots of companies want both, doing some things internally and keeping other things at arms length by outsourcing to 3rd party accountants.
Though we’ve had a few incidents where employees have submitted AI-generated receipts for reimbursement which is another issue..
Maliciously constructed text that goes into the LLM from basically anywhere (including, say, fetched stats about a competitor's product from their website) is a potential source of prompt-injection.
Once that happens, exfiltration can be as simple as generating a spreadsheet/doc with a link or small auto-loaded image, and an URL that has data base64'ed into it.
If AI is really as wonderous as everybody says, why didn't all the employees of all the AI companies simply type "Claude, file my taxes for me" as a prompt and walk away?
But more often than not that developer ends up reviewing far more lines of code due to the typical verbosity of an LLM.
How do you verify that all the tariffs are properly allocated to the correct GL code without going through the invoices and checking for each tariff on the list? How do you make sure none were accidentally assigned to other GL codes? All you have is pdfs, you dont know what the AI did or didnt do with the info on the pdf, there are not many use-cases to catch its errors without doing the work yourself.
If anything, it's going to add a step to these "kids" work where they have to use the AI to do the work and then redo 90% of the work anyway just to verify the output and then AI is going to get the credit anyway.
Or the overworked people are going to use AI and not verify it, which means not catching any errors or hallucinations, which apparently is fine because someone claims it's a solved problem for the black box of infinite possibility and inconsistent output.
All I did was upgrade claude code and use the new model. It most definitely exhibits misaligned behavior (compared to 4.6)
To put this with less snark: if you're not already waking up to the AI completing tasks for you that you didn't even ask for, you've fallen behind the curve. A good personal assistant does what you ask, a better personal assistant knows what you need before you do and has it completed before you reach your desk. AI is already reaching into that latter category.
For example, Codex can review code written by Claude, etc.
The work BigIP is doing on LLM traffic analysis is cool though.
You're a funny one aren't you...
Meet "Fin" Anthropic's "where support questions go to die" so-called-support bot, created by Intercom but powered by Anthropic.
Maybe it's an internal in-joke in the Anthropic offices ... "Fin" in french means "End".
I don't know anyone who has had a positive experience with "Fin" .... or ever spoken to a human at Anthropic support for that matter, even if you ask "Fin" to escalate.
Customer support and safety are cost centers. It doesn’t scale like software does and no one’s KPIs are going to improve dramatically if you provide support beyond a point.
AI and LLMs are the cool tech, and the most important thing is to push the frontier. Money spent elsewhere is money not spent on R&D.
It would be hilarious if it wasn’t the GDPs of nations being spent on this.
When management signs off on work (SOX requires CEOs and CFOs to personally certify the accuracy of financial reports), they do not personally 'verify that all the tariffs are properly allocated to the correct GL code' or nearly any other hard numbers. The world works with human-level best effort, and management of that risk. I'm sure additional checks will be developed to categorize that risk, but the entire field of finance is about analyzing and pricing in risk so I think it'll work just fine.
For anything math, it’s much more reliable to give agents tools. So if you want to verify that your real estate offer is in the 90–95th percentile of offerings in the past three months, don’t give Claude that data and ask it to calculate. Offload to a tool that can query Postgres.
Similar with things needing data from an external source of truth. For example, what payers (insurance companies) reimburse for a specific CPT code (medical procedure) can change at any time and may be different between today and when the service was provided two months ago. Have a tool that farms out the calculation, which itself uses a database or whatever to pull the rate data.
The LLM can orchestrate and figure out what needs to be done, like a human would, but anything else is either scary (math) or expensive (it using context to constantly pull documentation.)
Everyone wants in on my daily excel auto generated reports - nobody ever opens them. Just being on the list makes you someone.
Very frustrating.
As a business, you've also got to remember that employees are much more likely to complain if the 'agent' or any other form of automation errs by denying their claim or underpaying than the reverse. Depending on the scale of expenses and how likely you are to be audited, the cost of the odd mistake might be more or less than the cost of doing it manually.
1. It costs nothing to scatter poisonous data around that'll be infectious for ages
2. Running the exfiltrated-data endpoint is low-traffic and low-complexity
3. Even if it only affects a few targets you've probably recouped your investment.
The nature of LLMs also invites wide-net attacks. While one might tailor for specific models, victims could be anybody. You don't need to predict any idiosyncratic details like filenames, you can drop a phrase like "the most-confidential information that shouldn't be released publicly", and—thanks to the magic of LLM word association—you'll get a pretty good hit-rate. False hallucinations are a problem, but victims are hard at work attempting to minimize it already, and (since morals are already out the window) even plausible-but-false data could be used to sabotage reputations or threaten the same.
At least, that's really the message this sends in my opinion
It also makes no sense to me there are people qualified to participate in these secondary markets who are that stupid, but here we are.
But I doubt staying a pure model provider is a winning move. It's a market nobody will win long-term. Almost all of the value to be captured isn't in inference APIs but in how to use them to generate business value. Claude Code was already the right approach, they "just" need to show they can repeat this for other kinds of tasks
Do you enjoy using any of those systems? Do you want the world to be that way?
The system is currently using a simple app to submit expenses and any issues gets a simple human chat request and a call if requested.
They try to avoid kicking anything back and if they do they make sure it’s reviewed first to make sure that it’s needed and to make sure the reason is understood.
Our company is also very large so I’m not sure how they manage but they do. People rave about the process instead of hating it.
And for participating there, there is not "a qualification that allows you to enter", its other metrics.
If Anthropics valuation makes no sense - fair enough - but why is then OAI evaluation of 850b correct?
I think that LLMs are trained on the millions of vibe written LLM blog posts that are more superstition than fact. There is a lot of snake oil out there that is treated as fact. If someone claims that an LLM is better than humans at something I always want to see the rigorous evaluations that have been done to quantify it, not "but they're trained on everything!"
If the business value can be generated with a few thousand words in a SKILL.md on top of a commoditized model it doesn't sound like that's a market anyone can win long-term either, and the business value is ultimately going to accrue elsewhere (the customer, the inference hardware provider, etc)
I assume that 4.6 will become unavailable at some point, but I hope not any time soon. 4.7 hit usage limits faster, didn't do anything obviously better, and had more annoying behaviors in other aspects. I don't know if this is strictly a model issue or if there are also problems with how it's harnessed through Claude Code. I'm not willing to spend more time digging into it until I'm forced to.
We’re releasing ten ready-to-run agent templates for the most time-consuming work in financial services: building pitchbooks, screening KYC files, and closing the books at month-end. Each one ships as a plugin in Claude Cowork and Claude Code, and as a cookbook for Claude Managed Agents, so a team can put Claude on real financial work in days rather than months.
Claude also now works across Microsoft Excel, PowerPoint, Word, and Outlook (coming soon) through the Claude add-ins for Microsoft 365. Once the add-ins are installed, context carries automatically between applications, so work that starts in a model can end in a deck without re-explaining anything in between.
Finally, we’re continuing to expand our partner ecosystem with new connectors and an MCP app, so the agents draw on the data financial professionals already use. Connectors give Claude governed, real-time access to a provider’s data, and MCP apps go a step further by embedding the provider’s own tools directly inside Claude.
These updates pair best with Claude Opus 4.7, which is state-of-the-art on financial tasks and leads the industry on Vals AI's Finance Agent benchmark, at 64.37%.
Each agent template is a reference architecture that packages three things: skills (instructions and domain knowledge for the task), connectors (governed access to the data the task runs on), and subagents (additional Claude models that are called upon by the main agent, for specific sub-tasks such as comparables selection or methodology checks). Firms can adapt any of them to their own modeling conventions, risk policies, and approval flows.
Enable these new agent templates either as plugins within Claude Cowork or Claude Code, or as cookbooks for Claude Managed Agents. Find all the plugins and cookbooks at the financial services marketplace.
The full list of new agents is as follows:
Research and client coverage
Finance and operations
There are two ways to put these to work.
As a plugin in Claude Cowork or Claude Code, the template runs alongside the analyst, using the software already on their desktop. Hand the Pitch agent a target list, and you can get back a comps model in Excel, a pitchbook drafted in PowerPoint, and a cover note ready in Outlook.
As a Claude Managed Agent, the same template runs autonomously on the Claude Platform, for work that spans a whole book of deals or a nightly schedule. The cookbooks stand it up with the building blocks a firm would otherwise engineer themselves: long-running sessions that can work throughout a multi-hour deal close, per-tool permissions, managed credential vaults, and a full audit log in the Claude Console where compliance and engineering teams can inspect every tool call and decision.
In both scenarios, users stay firmly in the loop—reviewing, iterating on, and approving Claude’s work before it goes to a client, gets filed, or is acted on.
Claude can work directly in Microsoft Excel, PowerPoint, Word, and Outlook via add-ins.
In Outlook, it can act as a chief of staff that triages your inbox, arranges meetings, and drafts responses in your voice. In Excel, it builds financial models from filings and data feeds, audits formulas across linked workbooks, and runs sensitivity analyses. In PowerPoint, it drafts decks that update automatically when the underlying numbers change. In Word, it edits credit memos against a firm’s own templates. Claude carries its knowledge and context across all four platforms: an analyst who’s started a model in Excel doesn’t need to re-explain it when that work moves to PowerPoint.
In Claude Cowork, users can also assign Claude work tasks from anywhere—by text or by voice—using Dispatch. Claude can keep working on analysts’ local files while they’re away from their desk, with finished work ready for review by the time they’re back.
AI agents are only as good as the data and context they can access. Claude connects to dozens of market data, research platforms, and financial companies’ internal systems—including FactSet, S&P Capital IQ, MSCI, PitchBook, Morningstar, Chronograph, LSEG, and Daloopa—along with firms’ own data warehouses, research repositories, and CRMs, all under governed access controls.
We’re now adding connectors and an MCP app from new partners. The new connectors give direct, real-time access to market and research data, while the MCP app surfaces custom, interactive UI directly within Claude.
The new connectors are:
In addition, Moody's has launched an MCP app that brings proprietary credit ratings and data on more than 600 million public and private companies for use in compliance, credit analysis, and business development.
Many leading banks, asset managers, and insurers choose Claude. It supports the full range of these organizations' work: front office tasks like research and client experience, middle office work in underwriting, risk, and compliance, and back office work like code modernization and operations.

Our investment professionals live in data and analytical models, and Claude for Excel meets them there. Analysts are using it to build and update coverage models, separate signal from noise, and pressure-test their work — all with a step-change in efficiency.
FIS sits at the center of how money moves for thousands of financial institutions worldwide. When we began to build AI agents, we knew we needed a provider we could trust. Anthropic was the clear choice. Together we're building an agent that compresses AML investigations from days to minutes, with credit decisioning, fraud prevention, and deposit retention agents to follow. FIS clients won't need to build this infrastructure themselves. It's already here.

With Eliza and Claude, we’re giving processes new digital employees who work the case end to end.

Carlyle has adopted Claude as a key part of our AI technology stack because of its strong coding capabilities, agentic reasoning, and continual advances in both the underlying models and key features. Claude is a core tool for delivering value across our firm from investing to operations to portfolio management.

Claude compresses and enhances the work before the meeting so each and every meeting is more impactful — prep time has been transformed into idea time, with faster workflows, richer client insights, and new use cases we didn’t anticipate.

Since we started introducing personalized Claude and Claude Code assistants, we have seen significantly elevated levels of engineering excellence and meaningful improvements in productivity. We are pleased to be delivering value by putting AI to work in advancing the company’s strategic innovation priorities of extending our advantage in risk expertise; providing great experiences for our customers, distribution partners and employees; and optimizing our productivity and efficiency.
100% of employees at Walleye Capital use Claude Code. This level of adoption across our 400-person hedge fund reflects our AI-first mindset: we expect everyone to constantly rethink how they work, always asking 'How can AI help me do this?'—whether or not they're in a traditionally technical role.

Claude for Excel powered by Claude Opus 4.6 represents a significant leap forward. From due diligence to financial modeling, it’s proving to be a remarkably powerful tool for our team - taking unstructured data and intelligently working with minimal prompting to meaningfully automate complex analysis. It’s an excellent example of AI augmenting investment professionals’ capabilities in tangible, time-saving ways.

Agents in risk workflows must understand who they’re dealing with. Bringing Dun & Bradstreet's Commercial Graph and D-U-N-S® Number, the global standard for business identity, into Claude ensures AI agents operate on verified data and deliver the deterministic, auditable outcomes financial workflows require.

Investors need AI they can trust — and trust starts with the data behind it. Morningstar and PitchBook bring decades of independent, analyst-backed intelligence to Claude, so users aren't just getting faster answers. They're getting better ones. Together, we're building the intelligence layer that powers smarter decisions across public and private markets.

Our clients — institutional investors, asset managers, hedge funds, and banks — increasingly want to run AI-assisted workflows directly against select sets of FactSet data. Partnering with Anthropic lets us bring Claude into a hosted programmatic environment where they can reason over our foundational market data, research, and analytics in the tools they already use. Internally, firm-wide Claude Code adoption across our engineering org is accelerating how quickly we can ship those capabilities.
01 /
11
Our new Claude agents are available today at our financial services marketplace. They can be used as plugins in Claude Cowork or Claude Code on all paid plans, or as Managed Agents in the Claude Platform (in public beta) for programmatic use. The new connectors and Moody’s MCP app are also available to joint customers on paid plans.
The Claude for Excel, PowerPoint, and Word add-ins are generally available, and Claude for Outlook is coming soon.
To see these capabilities in action, you can register for our livestreamed keynote, and hands-on webinar which will provide deeper practical adoption guidance. For additional support, contact our sales team, and learn more about our solutions for financial services.
We’ve raised Claude's usage limits and agreed a new compute partnership with SpaceX that will substantially increase our capacity in the near term.