Scrape YouTube Channel Transcripts to RAG

Fetch videos from a YouTube channel via Supadata, extract transcripts with metadata, and ingest them into a Needle collection with rich labels for retrieval.

YouTubeTranscript ExtractionRAGKnowledge BaseAI SearchMetadata LabelingChannel Intelligence
Needle Team

Key Takeaways

  • Bulk channel ingestion — Pulls up to hundreds of YouTube videos from one channel handle in a single run
  • Transcript-first RAG pipeline — Converts each transcript into Markdown and stores it in your Needle collection
  • Rich labeling for retrieval — Adds structured labels like videoId, durationMinutes, topic_*, and mentions_*
  • Two-stage loop architecture — Fetch IDs first, then iterate each video for transcript extraction + indexing
  • Built for AI agent memory — Gives your collection searchable long-form video knowledge, not just links

What This Workflow Does

This workflow turns a YouTube channel into a searchable RAG knowledge base.

You provide a channel handle (for example @n8n) in the Manual Trigger. The workflow then:

  1. Gets all video + short IDs from Supadata
  2. Loops each video ID and fetches transcript content
  3. Normalizes transcript text and generates Markdown files
  4. Adds each transcript file to your Needle collection
  5. Adds metadata labels to power semantic + filter-based retrieval

Setup

  1. Create a Supadata account and API key
  2. Set SUPADATA_API_KEY as a secret workflow variable
  3. Select your target Needle collection in both Needle nodes
  4. Run with a channel handle like @channel_name

Labels Added

  • Source + identity: source, videoId, videoUrl, channelHandle
  • Video shape: isShort, videoType, language
  • Stats: wordCount, characterCount, durationMinutes, lengthCategory
  • Time buckets: indexedDate, indexedYearMonth, indexedYear, indexedMonth
  • Flags: isTutorial, isReview, isNews, hasLiveDemo, hasCTA
  • Dynamic topic labels: topic_*
  • Dynamic tool labels: mentions_*

Troubleshooting

  • No files added: Verify channel handle and API key.
  • Transcript missing: Some videos do not expose transcripts; loop continues by design.
  • Labels missing: Ensure both Needle nodes target the same collection.
  • Run is slow: Lower initial fetch limit or increase wait tolerance.

Want to showcase your own workflows?

Become a Needle workflow partner and turn your expertise into recurring revenue.

Ready to vibe automate?

Join thousands of people who have transformed their workflows.

Workflows

Automations with AI agents

Collections

All your data, searchable

Chat Widget

Drop-in widget for your website

Developer API

Build AI-powered apps with ease

    We use cookies to enhance your experience on Needle and keep your data secure. Privacy Policy