guides

How to search inside saved videos: the complete 2026 guide

Stop scrolling through dozens of saved YouTube and Instagram videos to find the moment you remember. Here's how transcript search actually works in 2026, four tools that do it, and a step-by-step setup.

SavedThat teamPublished May 9, 202611 min read

You watched a great YouTube interview six months ago. Someone said something that stuck — a specific phrase about pricing strategy, or a particular study they cited. Now you need it. You open your "Watch later" list and scroll. Past 200 videos. Past dozens of cooking clips you saved by accident. The video is there somewhere, you just can't tell which one.

This is the failure mode every video bookmark tool has had since the YouTube favourites button shipped in 2005. Bookmarks save the URL. They don't save what was said. Searching them is searching titles, which are SEO-optimised marketing taglines, not what the video is actually about.

In 2026 you can do better. The technology to search inside the spoken content of every video you've saved — across YouTube, Instagram Reels, and TikTok, in any language — is now in commodity territory. This guide walks through how it works, the four real tools that do it, and how to set up the workflow end-to-end.

Why your bookmarks fail you

Three things are working against you.

Titles are not content. A 90-minute podcast on "How we built a $40M ARR business" might be saved because the host's third question — at 38 minutes in — is the only thing you needed. The title doesn't know that. Neither does any tag you'd realistically add at save time.

Memory of where you saw it decays fast. Research on video recall (NN/g, 2022) shows users misattribute the source platform within ~14 days at a rate above 30%. You'll remember the quote. You won't remember if it was YouTube, a Reel, or a TikTok.

Transcripts exist but are scattered. YouTube has captions for 84% of uploads in major languages. Instagram and TikTok auto-generate them. The transcripts are public. But each platform makes you search inside them one video at a time, after you've already opened the video — exactly when you don't know which video to open.

What "searching inside videos" actually means in 2026

Two distinct techniques are now mainstream, and most modern tools combine them:

Full-text search (FTS). The classical approach: index every word of the transcript, then match the user's query as a literal substring or token. Fast, deterministic, easy to explain. Fails when the user remembers the idea but uses different words. If you search "what is the cost of customer acquisition" but the speaker said "how much do we pay to bring in a new customer," FTS will not find it.

Semantic search via vector embeddings. Each chunk of transcript gets converted into a 768- or 1536-dimensional vector by a model like OpenAI's text-embedding-3-small. Your query is converted into the same kind of vector. The system returns chunks whose vectors are nearest in cosine distance. This works across phrasings, across languages, and even when the speaker used jargon you've never heard.

The state of the art in May 2026 is hybrid search with reciprocal rank fusion (RRF): run both FTS and semantic search in parallel, then merge the rankings. Tools that ship hybrid retrieval consistently beat single-method ones in head-to-head evaluations (Microsoft Research, 2024) because the two methods catch different failure modes — FTS nails exact-phrase recall, semantic catches paraphrased intent.

The four tools that do this in 2026

There's no shortage of "save videos" apps. There's a small set that actually search transcript content. Here's the honest comparison.

Tool	Search type	Platforms	Free tier	Starts at
SavedThat	Hybrid (semantic + FTS, RRF)	YouTube, Instagram, TikTok	30 saves/mo	$6.99/mo
Glasp	FTS over highlights	YouTube only	Unlimited highlights	$0
Mem	Semantic, all content	Manual paste only	Trial	$8.33/mo
DIY (Whisper + pgvector)	Hybrid, you wire it	Anything you script	Free, self-hosted	Compute cost

Each one optimises for something different.

SavedThat — bookmark-first, search-native

Built around: the moment you save a video. Paste a YouTube/Instagram/TikTok URL, the system pulls the transcript, chunks it into ~18-second windows with 5-second overlap, embeds each chunk, and indexes both vectors and full-text. Search across your library returns the exact moment with a deep-link to that timestamp on the original platform.

Strengths: the only one of the four that handles all three short-video platforms. Cross-user dedup means if you save a video someone else already saved, your library updates instantly with no extra credit cost. Hybrid retrieval is the default, not an upgrade tier.

Trade-offs: you're paying for the transcription credits on Instagram and TikTok (YouTube transcripts are free to fetch). Free tier gets 30 saves/month, which is enough to evaluate but cramped if you save heavily.

Glasp — for the YouTube highlight crowd

Built around: the active reader who watches with intent. Glasp lets you highlight specific sentences in YouTube transcripts as you watch, then search across only those highlights. It's been the dominant tool for the highlights workflow since it launched in 2021.

Strengths: zero cost, works without an account for one-shot exports, integrates with Readwise.

Trade-offs: YouTube only — no Instagram, no TikTok. Search is keyword over highlights, not over the full transcript, so unhighlighted moments are invisible. You have to actively highlight; passive saves don't get the search benefit.

Mem — semantic, but you do the saving manually

Built around: the AI-first note-taker. Mem ingests anything you paste — including YouTube transcripts — and runs semantic search across all of it via OpenAI embeddings.

Strengths: broadest content scope. If your video bookmarks coexist with notes, articles, and Slack messages in one searchable surface, Mem is the cleanest answer.

Trade-offs: no platform integrations. You paste the transcript yourself, manually. For prosumers who actually save 50+ videos a month, this is friction the math doesn't survive.

DIY — Whisper + pgvector

Built around: the developer who wants the same thing without a SaaS subscription. Run Whisper locally to transcribe, store chunks in pgvector on a Postgres instance you control, expose a search endpoint. Total bill of materials: a Mac mini, Docker, and ~10 hours of plumbing.

Strengths: total control, no recurring spend, infinite saves.

Trade-offs: infinite saves for you — sharing or accessing on mobile means more plumbing. Whisper is excellent for English but gets noticeably worse on Russian / Hindi / Arabic compared to commercial APIs that fine-tuned for those languages. And every new platform (Instagram added a redesigned reels API in March 2026; TikTok rotated their watermark format twice last year) is a maintenance ticket.

Step-by-step: set up transcript search with SavedThat

Eight minutes from sign-up to your first searched-and-found moment.

Open savedthat.app and sign up with email or Google. The onboarding overlay shows a paste field — drop in your first URL, or click Try with a demo video to load a canned 3-minute clip. You don't need a credit card and the free tier is unlimited in time (just capped at 30 saves per month).

2. Paste a video URL

The paste field accepts any of these formats:

https://www.youtube.com/watch?v=...
https://youtu.be/... (shortened)
https://www.youtube.com/shorts/...
https://www.instagram.com/reel/...
https://www.tiktok.com/@user/video/...

Click Save. The video lands in your library within ~10 seconds for YouTube (transcript fetch is fast), 30–60 seconds for Instagram and TikTok (audio transcription is slower). If someone else has already saved that exact URL, yours appears instantly — we share the indexed transcript across users.

3. Search by what was said

Once you have at least one video, the ask field at the top of the library accepts free-text queries. Try one of these to see the difference between keyword and semantic:

Direct phrase: "customer acquisition cost" finds the exact mention.
Paraphrased intent: "how much does it cost to find new buyers" — semantic search catches the same idea even though no word in the query appears in the transcript.
Cross-language: "стоимость привлечения клиентов" in Russian — the multilingual text-embedding-3-small model maps Russian and English vectors close enough that an English transcript still matches.

Each result shows the exact quote, the timestamp, and a deep-link that opens the original video at that second on the right platform.

If you find a moment worth keeping or sending to someone, click Share on the result. SavedThat issues a short URL like savedthat.app/s/abc123 that opens the original video at the timestamp directly — no SavedThat account required for the recipient.

Edge cases worth knowing

Videos with no transcript. YouTube has captions for ~84% of uploads. Instagram/TikTok auto-generate from audio. When neither works (rare, mostly for music videos and silent montages), the save still appears in your library with the title and thumbnail — it's just not searchable until a transcript becomes available. Free tier saves don't count this against your quota.

Multilingual videos. Hybrid search is multilingual at the vector layer because OpenAI's embedding model was trained across 100+ languages. The full-text-search component does language detection per chunk and uses Postgres' tsvector with the right language config. A query in Russian against an English transcript will still hit semantic matches; the FTS bonus only kicks in when the query language matches the chunk language.

Long videos. A 4-hour podcast becomes 800+ chunks. SavedThat caps single-video duration at 1h on free, 2h on Pro, 3h on Power — past that we'd be paying transcription costs that don't scale on the consumer pricing tiers. Hybrid search performance stays under 200ms even on libraries with 50K+ chunks because we use HNSW indexing on halfvec(768) embeddings.

Privacy. Transcripts and your save list live in your private library by default. Nothing is shared unless you explicitly issue a share link.

The honest verdict

If you save more than five videos a month and at least once per quarter you find yourself thinking "I know I saw this somewhere" — SavedThat is built for you and it's the only one of the four with first-class Instagram and TikTok support. If you're a YouTube-only highlights person who likes annotating, Glasp is great and free. If you're a Mem power user who already lives there, paste in the transcripts and stay. If you're a developer who enjoys plumbing, the DIY stack is honest work.

Whichever you pick, the era of "I saved this somewhere" is ending. Search inside the videos, not around them.

Keep reading

guides

AI video bookmark managers compared: SavedThat vs Mymind vs Raindrop vs Glasp (2026)

Honest head-to-head of the four real options for saving and re-finding videos with AI in 2026. Pricing, search depth, platform coverage, and which one wins for your use case.

Frequently asked questions

Does YouTube already let you search inside videos you saved?

YouTube's native search only matches video titles, descriptions, and tags from your Watch Later or saved playlists — not the spoken content. There is no built-in 'search across the transcripts of all my saved videos' feature. Third-party tools like SavedThat exist because YouTube's product roadmap has not prioritised this since at least 2020.

Can I search Instagram Reels and TikTok bookmarks the same way?

Yes, but you need a tool that fetches the audio transcript for those platforms — neither Instagram nor TikTok exposes a transcript API the way YouTube does. SavedThat uses Supadata to auto-transcribe Reels and TikToks at save time. Glasp and Mem only support YouTube; native Instagram/TikTok search inside your saves is not available without a third-party tool.

How accurate is auto-generated transcription?

For English in standard recording conditions, modern auto-captions hit roughly 95% word accuracy. For Russian, Spanish, Portuguese, and other languages with strong language-model coverage, 90–94% is typical. Heavy accents, music backing, and overlapping speakers still hurt accuracy. Search remains useful well below perfect transcription because semantic embeddings handle near-misses gracefully.

What's the difference between full-text search and semantic search?

Full-text search matches the literal words in your query against the words in the transcript. Semantic search converts both the query and the transcripts into numerical vectors that represent meaning, then returns the closest matches by cosine distance. Semantic search finds 'how much does it cost to acquire customers' when the transcript says 'CAC' or 'customer acquisition cost'; full-text would miss it. The best tools combine both.

Will saved videos still work if the original is deleted from YouTube?

The transcript and your bookmark stay in your library, but the deep-links point to the original platform — if YouTube removes the video, the link breaks. SavedThat keeps the transcript text indefinitely so search and quotes still work, but cannot replay the audio or video. For long-term archival of the actual media, you need a separate solution.

Is this expensive at scale?

On consumer plans, no. The bulk of the cost is one-time transcription at save (free for YouTube, paid per-Reel for Instagram/TikTok). Embeddings are pennies per 1,000 chunks via OpenAI's text-embedding-3-small. Storage of 768-dimension halfvecs in Postgres is on the order of 1.5KB per chunk. A 50,000-chunk personal library fits comfortably in a $25/month Postgres instance — and on a hosted tool like SavedThat the unit economics are factored into the $6.99 Pro tier.