The preprocessing layer RAG is missing

March 12, 2026

RAG has a ceiling

Ask someone with 2,000 notes “what have you been circling around lately?” and they’ll think for a minute and give you something interesting. Ask a RAG pipeline the same question and it has nothing. There’s no passage to retrieve. The question is about a pattern across documents, and chunk-embed-retrieve can’t see patterns.

The ceiling shows up the moment content accumulates past what any single query can reach.

A writer with years of notes — journal entries, research, half-finished drafts — ran a search about a direction she’d been developing. She expected the recent notes. What came back was a reflection from months earlier, written in a completely different context, that used language she recognized as hers but had forgotten writing. Her reaction: “It talks like a writer. How??”

The surprise wasn’t that the system found a note. It was that it found the voice — the way she thinks about things — threaded across entries she’d never connected.

A founder tracking patterns across his research archive asked Enzyme about a theme he’d been circling. The system surfaced a thread running through notes he’d written over months — a pattern he hadn’t named yet. He described it as “a consultant looking at you… as a friend who’s an analyst.” The connection existed in the language, but no keyword search would have found it.

These aren’t retrieval problems. “What have I been circling around lately?” doesn’t have a passage to retrieve. It’s a question about patterns across documents, and RAG’s architecture — cold start per query, flat index, no accumulated understanding of the corpus — can’t answer it.

IBM’s own research: “pure RAG is not really giving the optimal results.” Industry estimates put RAG’s production failure rate at 40–60%. The gap is consistent. Factual lookup works. Meaning doesn’t.

Why RAG struggles with accumulated content

Every RAG query starts from scratch:

User query → embed → cosine similarity against chunks → top-k passages → LLM prompt

No concept of what themes run through someone’s notes. No sense of what ideas recur across their bookmarks or what tensions keep appearing in their meeting transcripts. The index is flat. Each query hits it fresh.

This is fine for “find me the passage about X.” It falls apart when someone asks “what does all of this add up to?” — because that question requires understanding the corpus as a whole, not matching against individual chunks of it.

What Enzyme does differently

Enzyme is a preprocessing layer. Before anyone asks a question, it builds a thematic profile of the entire corpus:

Corpus → entity extraction → thematic clustering → catalyst generation → precomputed similarity

Entities are the concepts, people, projects, and ideas that recur across the content. Catalysts are LLM-generated thematic handles — questions and claims that probe what the corpus is actually about, grounded in the user’s own vocabulary. Similarity between catalysts and documents is precomputed at index time.

When a query arrives, it doesn’t search documents directly. It finds which concepts are relevant, then retrieves the documents grouped by concept.

Someone searches “letting go of control.” RAG looks for passages containing those words or their semantic neighbors. Finds nothing — the user wrote “trusting the process” in one note and “releasing grip” in another. Enzyme already clustered those under a concept it identified during indexing. Three notes across six months come back, none containing the query terms, all about the same tension the user was trying to name.

RAG searches your words. Enzyme queries your ideas.

The preprocessing layer, not a replacement

Enzyme doesn’t replace RAG. A team building a note-taking app can use RAG for keyword-adjacent search and Enzyme for the thematic layer — trending topics, cross-note connections, what a user’s been thinking about over time.

The two systems operate at different levels. RAG runs at query time against individual passages. Enzyme runs at index time against the full corpus. RAG needs term overlap or semantic proximity to find matches. Enzyme matches across different phrasings because it already identified the underlying concept. RAG costs per query (embedding + retrieval). Enzyme’s queries are local and precomputed — $0 at search time, with a one-time indexing cost of cents per user.

They answer different questions. RAG answers “which passage is about X?” Enzyme answers “what has this person been thinking about?”

Local-first semantic search: how Enzyme ships

11MB binary. 23MB embedding model (int4 ONNX). SQLite for persistence. Runs on Runs on CPU, no GPU. Semantic search in ~50ms.

The only cloud cost is a lightweight LLM call during indexing — $0.01–0.10 per user per refresh. After that, queries are 100% local. No per-query pricing. Data never leaves your infrastructure.

For teams already running RAG: Enzyme is the preprocessing step that makes your retrieval more meaningful. It doesn’t compete with your vector database. It gives your vector database something worth indexing.

When to use RAG, Enzyme, or both

RAG is the right tool when your users ask factual questions about known content. “What’s the status of Project X?” “What did the contract say about termination?” Passage retrieval.

Enzyme is the right tool when the value is in accumulated meaning — note-taking apps, meeting transcript tools, reading highlight managers, companion apps. Anywhere users build up a corpus over time and the product’s job is to show them what it adds up to.

Both when you want retrieval and thematic structure. Enzyme’s precomputed concept map can feed into RAG’s retrieval step. Instead of searching a flat chunk index, search against thematically grouped and pre-scored documents.

The question isn’t which one. It’s whether retrieval alone is enough for what your users actually need.

Try it. Enzyme indexes Obsidian vaults using catalyst-mediated semantic search. Free for individuals. If you’re building a product and evaluating memory infrastructure, see how Enzyme runs on your infra.

Read next: When worse embeddings give better results — the architecture behind catalyst-mediated retrieval and why looseness in the embedding layer is a feature.