How it works

Enzyme is a compile step for knowledge bases. It reads your workspace’s structure β€” tags, folders, links, timestamps β€” as a context graph with temporal weight on every entity. It generates questions (catalysts) from what it finds. Those questions become the search layer. Queries run through them instead of through your words.

Think of it like a compiler: the expensive work happens once at init, and everything after that is fast lookups. No runtime reasoning, no conversation history needed.

The pipeline is the same whether the input is a personal notes workspace or a product corpus of user saves, design explorations, or conversation transcripts. The structure changes. The engine doesn’t.

Your workspace
#research #product-decisions folder:meeting-notes [[Project Atlas]] + 847 notes
↓
Entities extracted
#research 142 notes Β· active this week
folder:meeting-notes 89 notes Β· 14 months of meetings
[[Project Atlas]] 31 references Β· dormant since Jan
↓
Catalysts generated
#research "The methodology shifted twice in six months β€” what prompted each turn?"
folder:meeting-notes "Which decisions kept getting reopened, and what was unresolved each time?"
[[Project Atlas]] "The team stopped referencing Atlas in January. What replaced it?"
↓
Search through questions
query "why did we change direction on the API"
matched catalyst "Which decisions kept getting reopened, and what was unresolved each time?"
surfaced meeting-notes/2025-11-08.md β€” a retro note from 4 months ago where someone named the real blocker
Your users' accumulated choices
folder:saves folder:explorations #high-signal 1,200 artifacts over 8 months
↓
Entities extracted
folder:saves 840 items Β· steady growth since launch
folder:explorations 360 sessions Β· active bursts around deadlines
#high-signal 47 items the user explicitly flagged
↓
Catalysts generated
folder:saves "Three clusters of saves appeared in the same week β€” what were you circling?"
folder:explorations "You chose the constrained option four sessions in a row β€” is that a preference or a phase?"
#high-signal "The flagged items share a texture the rest of the collection doesn't. What's the thread?"
↓
Search through questions
query "what kind of design does this user keep returning to"
matched catalyst "You chose the constrained option four sessions in a row β€” is that a preference or a phase?"
surfaced explorations/session-2025-09.md β€” a session where the user spent 4x longer on tightly scoped options before choosing one. The pattern started here.

Init takes under 20 seconds for a thousand documents, and costs only cents. Everything after that is local. Queries return in ~8ms on the on-device embedding model. No API calls at search time β€” no per-query cost, no data leaving your machine.

The index refreshes automatically before each agent session. New content gets indexed, embeddings update, catalysts regenerate if enough has changed.

For the engineering details β€” the embedding model, the scoring formula, the approximate search tradeoffs β€” see the blog posts on approximate search and the preprocessing layer RAG is missing. For how the pipeline applies to product teams building taste and preference into their products, see for teams.