Search isn't the hard part

March 12, 2026

The week-one version

Your team builds semantic search in a week. Embed your documents, store the vectors, cosine similarity against a query. It works. Users type a question, they get back a passage. You ship it.

Then someone asks: “What have I been thinking about this month?” Or: “What am I not seeing across everything I’ve saved?” And the search bar just sits there. It can find a document if you know what you’re looking for. It can’t tell you what you don’t know to look for.

A few users have described this to me as wanting to see their blind spots. They’ve accumulated hundreds of captures — meeting transcripts, saved articles, highlights, journal entries — and they can feel that there’s structure in there. Recurring themes. Threads they keep pulling on without realizing it. Connections between something they wrote in January and something they saved last week. The search you built can’t surface any of it, because search answers queries. Nobody knows the query for “show me the pattern I haven’t noticed.”

What sits between search and meaning

The gap between “we have search” and “our users see what their content means” is a preprocessing layer. It runs before anyone asks a question. It looks at the full corpus and builds a thematic profile — the conceptual structure of what someone’s been accumulating.

This is the part that takes months. The components:

Entity extraction. Tags, links, folders, recurring references — whatever structure exists in the content, explicit or implicit. These become semantic clusters. A user who keeps saving articles about decision-making under uncertainty and also journaling about a career change has two entities that the system needs to recognize as related, even if the user never connected them.

Thematic clustering. Entities on their own are just labels. The preprocessing layer generates thematic handles from them — questions, tensions, claims that probe what the content is actually about. A tag like #productivity in the context of someone’s journal entries about parenthood produces something more specific than either word alone. The handle has to come from the content’s own vocabulary, not from a generic topic model.

Precomputed similarity. At query time, you don’t want to be computing relationships from scratch. The preprocessing layer builds a similarity map: every thematic handle scored against every document, offline, during indexing. Search at query time becomes a lookup into precomputed structure. That’s how you get 50ms responses on a user’s full corpus without a GPU.

Domain configuration. A meeting transcript app and a reading highlights app have different content shapes. What counts as an entity, how thematic handles should be generated, what temporal weighting to apply — these vary by product. The preprocessing layer needs a configuration surface that tunes profiling to the content type.

Each of these is a real engineering project. Together they’re 2–6 months of work, and then you maintain them. The team that built search in a week discovers that search was the easy part.

Where this runs matters

Once you have a preprocessing layer that builds thematic profiles over user content, the deployment question becomes pointed. This layer holds the most longitudinal, personal data your product touches — what someone’s been reading, writing, thinking about over months. The profile it builds is a map of someone’s intellectual life.

For products where this matters — companion apps, journal tools, meeting intelligence, anything health-adjacent — compliance teams and users both care where that data lives. “Your data never leaves your infrastructure” is a requirement, not a feature. Legal teams won’t approve sending user content to a third-party memory service for profiling. Users increasingly ask.

The architecture that satisfies both is embedded. The preprocessing runs on your infra. Each user’s index is a local file. Queries never leave the machine. No third-party data processor in your compliance docs.

What Enzyme ships

Enzyme is the preprocessing layer described above, packaged as an 11MB Rust binary with a 23MB embedding model. Total footprint: 34MB. Runs on Runs on CPU, no GPU.

What it does: entity extraction, thematic catalyst generation, precomputed similarity scoring, and a domain configuration layer that auto-tunes to your content shape. Each user’s index is a SQLite file and an embeddings binary — a few MB per user.

Queries run 100% local after indexing. Semantic search in ~50ms. Entity lookups in ~1ms. The only cloud cost is a lightweight LLM call during indexing for catalyst generation — $0.01–0.10 per user per refresh, through your own API key.

It integrates as an SDK. Init, refresh on content update, query when you need context.

The question this answers

A user opens your app and asks “what am I not seeing?” Your product can either return ten keyword matches or show them the thematic structure of six months of accumulated material — the threads they keep returning to, the connections between captures they made in different contexts, the blind spots in what they’ve been paying attention to.

The second version requires a preprocessing layer that most teams underestimate until they’re building it. If that’s where you are — search works, but meaning doesn’t — here’s how Enzyme runs on your infra. If you want to try the engine on your own notes first, it’s free for individuals.

Read next: When worse embeddings give better results — the engineering behind catalyst-mediated retrieval, and why Enzyme uses a 23MB model instead of a 400MB one.