Docs · More
Search and OCR
Every capture carries an on-device text layer your agent can read and search, so it can find the screenshot from an hour ago without you scrolling back to it.
A capture isn't just pixels. The moment you take one, Noru reads the text off it on your Mac and keeps that text alongside the image. That text layer is what lets your agent search your captures by what was in them, and skip the pixels entirely when it only needs the words.
The on-device text layer
Noru runs OCR on every capture using Apple's built-in vision framework. It reads the on-screen text, an error message, a URL, a stack trace, the label on a button, and stores it with the image. No capture or text ever leaves your machine to do this. There's nothing to enable and nothing to wait for, it's just there on every shot.
The text comes through verbatim, the same words your agent gets and the same words you'd read on screen. That's the point of OCR here: it gives the model a second, lighter way to understand a capture, and it makes your history searchable.
Searching by what you saw
You don't run a search yourself. You describe the capture you mean and your
agent reaches for search_captures, which looks
through the on-screen text, any transcript, and your notes for the word or
phrase. Talk to it the way you'd talk to a person who was looking over your
shoulder:
"Pull up the screenshot with the Stripe 400 error" or "where was that stack trace I showed you earlier?"
Search returns lightweight locator rows, not images, so it's cheap. Your agent
reads the hits, picks the right one, and then pulls the actual pixels with
get_capture. It's all read-only: searching never
consumes or disturbs the live handoff of whatever you captured most recently.
Just the text, when that's all it needs
Sometimes the words are the whole answer and the image is wasted tokens. When
your agent only needs what a capture said, it passes
text_only and gets the text layer without the
picture. Good for a long error log or a wall of config, where reading beats
looking. Most of the time it sends the image too, because seeing the layout
matters, but the option is there when you're minimizing tokens.
Pro: recall by meaning and by sight
The free text search is exact: it finds captures that literally contain your words. The memory tier adds two ways to find things when you don't remember the exact wording.
Both run on your Mac, like everything else. They're part of the one-time Pro purchase, alongside audio. See pricing for what's free and what's Pro.
It's all in the tools
Search, fetch by id, text-only, and visual recall are tools your agent calls, not buttons you press. The full set, and exactly when each one fires, is in the tool reference.