Compile-time vs runtime memory: how the query path actually works
The research page covers the numbers — how Enzyme compares to Mem0, Honcho, Zep, and Letta on token efficiency, latency, and external dependencies. This post covers the machinery: what happens during init, what happens during a query, and why the architecture produces structurally different results.
What happens at init
When you run enzyme init on a corpus of 1,000 documents, four things happen:
1. Structure reading (~2s). The engine walks the corpus and extracts entities — tags, wikilinks, folders — with temporal metadata: when each entity first appeared, when it was last active, whether it’s accelerating or going dormant. An entity isn’t a keyword. It’s a handle that the user’s own organizing behavior created.
2. Chunking and embedding (~5s). Documents are split into overlapping chunks and embedded into vectors using the compiled-in local embedding model. No model download, no API call, no data leaving the machine.
3. Catalyst generation (~10s). For each entity, the engine samples context excerpts across temporal eras — not just recent content, but the full timeline. An LLM generates catalysts: thematic questions that probe what the content is actually about.
A catalyst for an entity that spans 18 months of meeting notes doesn’t ask “what happened in meetings?” It asks something like: “The team revisited caching three times — once as a performance fix, once as a cost concern, once as a reliability question. What changed between each return?” That question cuts across the timeline. It names a pattern the user hasn’t named yet.
4. Similarity precomputation (~3s). Every catalyst is embedded and compared against every document chunk. The top 50 similar chunks per catalyst are stored. At query time, there’s no vector search — the similarities are already computed.
The important part: catalyst generation happens once per entity, and a user’s conceptual lens is stable even as content grows. New documents slot into existing catalyst relationships without regeneration. The cost scales with entity count (tags, links), not document count.
The query path
Here’s what Enzyme does when a query arrives:
- Embed the query (1–2ms — compiled-in local model on CPU)
- Find top catalysts by dot product (0.5ms — ~100 pre-embedded vectors)
- Look up pre-computed chunk similarities from SQLite (2–3ms — indexed table scan)
- Aggregate, weight by recency and multi-catalyst coverage, deduplicate (1–2ms)
~8ms total. Zero external calls. The query is a database lookup of relationships that were already computed — relationships that required reading the full corpus, spanning the full timeline, and generating thematic questions no single retrieval call would have time to produce.
A catalyst that connects a meeting note from 18 months ago to a journal entry from last week was identified during init, when the engine had time to read everything. At query time, finding that connection is a table lookup.
Why depth doesn’t require latency
Every query-time memory system is time-constrained. The LLM has one retrieval window — typically under a second — to search, rank, and synthesize. That’s enough to find a stored fact (“user prefers Python”) or recall a recent conversation. It’s not enough to identify a pattern that spans 18 months of content across different phrasings, or to name a conceptual thread the user hasn’t articulated yet.
Enzyme moves the expensive work to init. The depth of the result isn’t limited by query-time compute — it was already computed. The cost consequence follows: query-time memory tools have a cost floor per query because each retrieval involves LLM inference or external embedding API calls. Enzyme’s query path has no external dependency at all.
The cold-start advantage
A user signs up for your product and imports their reading highlights. With query-time memory, the intelligence layer is empty — it builds from conversations, not documents. With Enzyme, it’s ready in 15 seconds. The first conversation is as rich as the hundredth because the concept graph was compiled from what the user already brought in.
For products built on import — reading managers, collection tools, research platforms, curation apps — the compile-time approach means the value is immediate.
Apply: projecting understanding across corpora
Query-time memory is scoped to a single user’s conversation history. Enzyme’s concept graph is a portable artifact.
enzyme apply ./new-corpus takes the catalysts from one collection and projects them onto a different set of documents. The user’s intellectual framework becomes the lens for exploring unfamiliar content — papers that resonate with their existing thinking, articles that connect to themes they’ve been tracking, repositories that overlap with their design principles.
This is structurally impossible with query-time memory, because the understanding isn’t a separable artifact. It’s entangled with the conversation history that produced it.
When to use which
Query-time memory when:
- The primary interaction is conversational (chatbots, companions, support)
- The user starts with no existing content
- Understanding should accumulate incrementally from behavior
- You need fact-level recall from past conversations
Compile-time semantics when:
- Users bring existing content (imports, collections, knowledge bases)
- You need intelligence from day one, not after 50 conversations
- The value is in cross-source patterns, not individual fact recall
- Query-time cost needs to stay out of the hot path
- Understanding needs to transfer across corpora
Both when your product has an import-and-converse pattern. Compile the user’s imported content with Enzyme so the first conversation is rich. Layer query-time memory on top for ongoing personalization from chat.
Try it. Enzyme compiles Obsidian vaults, Readwise exports, and any markdown corpus into a searchable concept graph. Under 20 seconds.
Building a product? The SDK is in private beta for product teams with accumulated user histories — let’s talk.