I’ve found the latter approach to work much, much better than simple “store”/“remember” systems.
So, it just feels misleading to say this can do what Claude.ai’s can do…
(I’ve been looking for a memory system that works the same for a while, so that I can switch away from Claude.ai to something else like LibreChat, but I just haven’t found any. Might be the only thing keeping me on Claude at this point.)
-
*I say Claude.ai because that’s specifically what has the system; Claude Code doesn’t have this system
Not casting aspersions on you personally, I’d really like this from every project, and would do the same myself.
Digging deeper I can see it is effectively pg_vector plus mcp with two functions: "recall" and "remember".
It is effectively a RAG.
You can make the argument that perhaps the data structure matters but all of these "memory" systems effectively do the same and none of them have so far proven that retrieval is improved compared to baseline vector db search.
The only approach I've found that works is no memory, and manually choosing the context that matters for a given agent session/prompt.
1) An up-to-date detailed functional specification.
2) A codebase structured and organized in multiple projects.
3) Well documented code including good naming conventions; each class, variable or function name should clearly state what its purpose is, no matter how long and silly the name is. These naming conventions are part of a coding guidelines section in Agent.md.
My functional specification acts as the Project.md for the agent.
Then before each agentic code review I create a tree of my project directory and I merged it with the codebase into one single file, and add the timestamp to the file name. This last bit seems to matter to avoid the LLM to refer to older versions and it’s also useful to do quick diffs without sending the agent to git.
So far this simple workflow has been working very well in a fairly large and complex codebase.
Not very efficient tokens wise, but it just works.
By the way I don’t need to merge the entire codebase every time, I may decide to leave projects out because I consider them done and tested or irrelevant to the area I want to be working on.
However I do include them in the printed directory tree so the agent at least knows about them and could request seeing a particular file if it needs to.
If I am working on a real project with real people, it won’t have the complete memory of the project. I won’t have the complete memory. My memory will be outdated when other PRs are merged. I only care about my tickets.
I am starting to think this is not meant for that kind of work.
There is lots of competition in this space, how is your tool different?
I keep two files in each project - AGENTS (generic) and PROJECT (duh). All the “memory” is manually curated in PROJECT, no messy consolidation, no Russian roulette.
I do understand that this is different because the vector search and selective unstash, but the messy consolidation risk remains.
Also not sure about tools that further detach us from the driver seat. To me, this seems to encourage vibe coding instead of engineering-plus-execution.
Not a criticism on the product itself, just rambling.
How does it fight context pollution?
In practice, as it grows it gets just as messy as not having it.
In the example you have on front page you say “continue working on my project”, but you’re rarely working on just one project, you might want to have 5 or 10 in memory, each one made sense to have at the time.
So now you still have to say, “continue working on the sass project”, sure there’s some context around details, but you pay for it by filling up your llm context , and doing extra mcp calls
How many are we up to now? Has to be hundreds of them.
I doubt many people will honestly admit they did no design, testing and that they believe the code is sub par.
It does give me an idea that maybe we need a third party system which can try and answer some of the questions you are asking… of course it too would be LLM driven and quite subjective.
If you care that much and don't have a foundation of trust, you need to either verify the construction is good, or build it yourself. Anything else is just wishful thinking.
In a way, if it does accomplish that, it is a vectordb needing glorification.
I'd doubt any engineer that doesn't call most of their own code subpar after a week or two after looking back. "Hacking" also famously involves little design or (automated) testing too, so sharing something like that doesn't mean much, unless you're trying to launch a business, but I see no evidence of that for this project.
Well no, but if people want to see a statement like this, and given that most people will want to be at least halfway honest and not admit to slop, maybe it will help nudge things in the right direction.
We even ask when cakes are made in house or frozen even though they look and taste great (at first).
A friend told me he would like Claude to remember his personality, which is exactly what Gemini is trying to do.
A machine pretending to be human is disturbing enough. A machine pretending to understand you will spiral very far into spitting out exactly what we want to read.
stash.memory
Open Source · MCP Native · PostgreSQL + pgvector
Stash makes your AI remember you. Every session. Forever. No more explaining yourself from scratch.
28
MCP tools
6
Pipeline stages
∞
Agents supported
Sound familiar?
😫 Without Stash
Hey, I'm building a SaaS for restaurants. Can you help?
Of course! Tell me about your project.
We talked about this last week... I already explained everything.
I'm sorry, I don't have access to previous conversations.
...again?
🔁 You just wasted 10 minutes re-explaining yourself. Again.
VS
😌 With Stash
Hey, continuing work on my project.
Welcome back! Last time we finalized the pricing model for your restaurant SaaS. You were about to work on the onboarding flow. Want to pick up there?
Yes! Exactly that.
Great. You also mentioned you wanted to avoid Stripe's complexity — I have that noted. Here's where we left off...
✓ Picked up instantly. Zero repetition. Full context.
New session
❌ "Who are you again?"
✓ Picks up where you left off
Your preferences
❌ Re-explain every time
✓ Already knows them
Past mistakes
❌ Repeats the same errors
✓ Remembers what didn't work
Long projects
❌ Loses track of goals
✓ Tracks goals across weeks
Token cost
❌ Grows every session
✓ Only recalls what matters
Switching models
❌ Start from zero again
✓ Memory is model-agnostic
What is Stash
Stash is a persistent cognitive layer that sits between your AI agent and the world. It doesn't replace your model — it makes your model continuous. Episodes become facts. Facts become patterns. Patterns become wisdom.
"Your AI is the brain. Stash is the life experience."
your agent
Claude, GPT, local model, anything
episodes
Raw observations, append-only
facts
Synthesized beliefs with confidence
relationships
Entity knowledge graph
patterns
Higher-order abstractions
goals · failures · hypotheses
Intent, learning, uncertainty
postgres + pgvector
Battle-tested infrastructure
Namespaces
Not all memory is equal. What your agent learns about you is different from what it learns about a project, which is different from what it knows about itself. Namespaces let the agent organize what it learns into clean, separate buckets — just like folders on your computer.
Each namespace is a path. Paths are hierarchical. Reading from /projects automatically includes everything under /projects/stash, /projects/cartona, and so on. You never have to think about it — the agent does.
📁 Write to one namespace. Read from any subtree.
example namespace structure
📁 / everything
📁 /users/alice who alice is, her preferences
📁 /projects all projects
📁 /projects/restaurant-saas pricing, features, decisions
📁 /projects/mobile-app design, tech stack, goals
📁 /self agent self-knowledge
📄 /self/capabilities what I do well
📄 /self/limits what I struggle with
📄 /self/preferences how I work best
🔍
Recursive reads
Recall from /projects and get everything across all sub-projects automatically.
✏️
Precise writes
Remember always targets one exact namespace — no accidental cross-contamination.
🔒
Clean separation
User memory never mixes with project memory. Agent self-knowledge stays in /self.
agent session
Stash vs RAG
You've probably heard of RAG — Retrieval Augmented Generation. It's clever. But it's not memory. Here's the difference, in plain English.
📚 RAG
"A very fast librarian"
You give it a pile of documents. When you ask a question, it searches those documents and hands you the relevant pages. That's it. It doesn't remember your conversation. It doesn't learn. It doesn't know you. Every question starts from scratch — it's just a smarter search engine over files you already wrote.
VS
🧠 Stash
"A mind that grows"
Stash learns from everything your agent experiences — conversations, decisions, successes, failures. It synthesizes raw observations into facts, connects facts into a knowledge graph, detects contradictions, tracks goals, and builds an understanding of you that deepens over time. You don't write anything. It figures it out.
📚
RAG is like...
A brilliant intern who reads your files perfectly — but forgets everything the moment they leave the room.
→
🧠
Stash is like...
A colleague who was there from day one, remembers every decision you ever made, and gets more valuable every single week.
Can you use both? Yes — RAG is great for searching documents. Stash is for remembering experience. They solve different problems. Stash just goes much, much further.
Why Stash is Different
Claude.ai has memory. ChatGPT has memory. They only work for themselves — locked to one platform, one model, one company. Stash works for everyone, everywhere, forever. And it goes far deeper than any of them.
Remembers you
✓
✓
✓
Works with any AI model
✗
✗
✓
Works with local / private models
✗
✗
✓
You own your data
✗
✗
✓
Open source
✗
✗
✓
Background consolidation
✗
✗
✓
Goals & intent tracking
✗
✗
✓
Learns from failures
✗
✗
✓
Causal reasoning
✗
✗
✓
Agent self-model
✗
✗
✓
What it gives your AI
A notepad
A notepad
A mind
The Problem
🧠
AI models reason brilliantly but remember nothing. Every session you re-explain who you are, what you need, and what you've already tried. You're training the same student every single day.
💸
The workaround is stuffing full conversation history into every prompt. It's slow, expensive, and you still hit the limit. You're paying for tokens that repeat the same facts over and over.
🔄
Your agent tried something, it failed, and next session it tries the exact same thing again. There's no mechanism to carry lessons forward. Every failure is forgotten.
🔒
Only a handful of AI platforms offer memory — and only for their own models. Your custom agent, your local LLM, your Cursor setup? They all start blind. Memory shouldn't be a premium feature.
Express Setup
No infrastructure to set up. No dependencies to install manually. Docker Compose handles everything — Postgres, pgvector, Stash, all wired together and ready.
1
Clone the repo
2
Copy .env.example → .env and set your API key + model preferences
3
Run docker compose up — that's it. Stash is live.
terminal
$ git clone https://github.com/alash3al/stash
$ cd stash
$ cp .env.example .env
# edit .env with your API key,
# models and STASH_VECTOR_DIM
$ docker compose up
✓ postgres + pgvector ready
✓ stash migrations applied
✓ mcp server listening
✓ consolidation running in background
$
⚠️ Set STASH_VECTOR_DIM in your .env before first run. It cannot be changed after initialization.
01
📝
Episodes
Raw observations stored as they happen
02
💡
Facts
Clustered episodes synthesized by LLM
03
🕸️
Relationships
Entity edges extracted from facts
04
🔗
Causal Links
Cause-effect pairs between facts
05
🌀
Patterns
Abstract higher-order insights
06
⚖️
Contradictions
Self-correction and confidence decay
NEW
07
🎯
Goal Inference
Facts automatically tracked against active goals. Progress detected, contradictions surfaced.
NEW
08
💥
Failure Patterns
Detect repeated mistakes. Extract failure patterns as new facts. The agent stops repeating itself.
NEW
09
🔬
Hypothesis Scan
New evidence passively confirms or rejects open hypotheses. No manual intervention needed.
MCP Integration
Stash speaks MCP natively. Drop it into Claude Desktop, Cursor, or any MCP-compatible agent in under 5 minutes. No SDK. No vendor lock-in. Your agent remembers you everywhere.
28 tools covering the full cognitive stack — from raw remember and recall all the way to causal chains, contradiction resolution, and hypothesis management.
Claude Desktop Cursor OpenCode Custom Agents Local LLMs Any MCP Client
stash · mcp stdio
$ ./stash mcp execute --with-consolidation
$ ./stash mcp serve --port 8080 --with-consolidation
✓ remember · recall · forget · init
✓ goals · failures · hypotheses
✓ consolidate · query_facts · relationships
✓ causal links · contradictions
✓ namespaces · context · self-model
$
Agent Self-Model
Call init and Stash creates a /self namespace scaffold. The agent uses its own memory layer to build and maintain a model of its own capabilities, limits, and preferences.
/self/capabilities
What I can do well
The agent remembers where it excels and recalls these when planning how to approach a task.
/self/limits
What I struggle with
Recorded failures and known weaknesses. The anti-repeat mechanism. Never make the same mistake twice.
/self/preferences
How I work best
Learned preferences for how to operate. The agent develops a working style over time, not just facts.
Autonomous Loop
Give your agent a 5-minute research loop. It orients from past memory, researches a topic it chooses itself, invents new connections, consolidates what it learned, and closes gracefully — ready to pick up next time.
Run it as a cron job. Every 5 minutes, your agent gets smarter.
01
Orient
Recall context, active goals, open hypotheses, past failures
02
Research
Search the web on a topic the agent chooses itself
03
Think
Surface tensions, gaps, contradictions in what it now knows
04
Invent
Generate something new — a hypothesis, pattern, or discovery
05
Consolidate
Run the pipeline. Synthesize raw episodes into structured knowledge
06
Reflect + Sleep
Write a session summary. Set context for next run. Stop.
⚡
Stash itself runs on OpenRouter. The author runs Stash locally pointed at OpenRouter — access to hundreds of models, one API key, zero infrastructure.
☁️
OpenRouter gives you access to hundreds of models — GPT, Claude, Gemini, Mistral — all behind one OpenAI-compatible endpoint. Point Stash at it and pick any model for embedding and reasoning.
🏠
Running Ollama locally? Stash works out of the box. Use Qwen, Llama, Mistral, or any model you've pulled — your memory stays fully private, fully offline.
🔧
vLLM, LM Studio, llama.cpp server, Together AI, Groq — if it speaks the OpenAI API format, Stash speaks it back. Same provider serves both models.
⚠️
Set STASH_VECTOR_DIM before your first run and never change it. pgvector locks the embedding dimension at initialization — changing it later requires a full database reset. The default embedding model is openai/text-embedding-3-small with STASH_VECTOR_DIM=1536.
.env — same provider for embedding + reasoning
STASH_OPENAI_BASE_URL=https://openrouter.ai/api/v1
STASH_OPENAI_API_KEY=sk-or-...
STASH_EMBEDDING_MODEL=openai/text-embedding-3-small
STASH_REASONER_MODEL=anthropic/claude-3-haiku
STASH_VECTOR_DIM=1536
STASH_OPENAI_BASE_URL=http://localhost:11434/v1
STASH_EMBEDDING_MODEL=nomic-embed-text
STASH_REASONER_MODEL=qwen2.5:3b
STASH_VECTOR_DIM=768
STASH_OPENAI_BASE_URL=https://api.groq.com/openai/v1
STASH_EMBEDDING_MODEL=openai/text-embedding-3-small
STASH_REASONER_MODEL=llama-3.1-8b-instant
STASH_VECTOR_DIM=1536
Give your agent
a memory.
Open source. Apache 2.0 licensed. Backed by PostgreSQL. Works with any MCP-compatible agent.