Create An Agentic Chat For Your Audio Files
Turn your audio recordings into an AI-powered chat. Transcribe MP3 files from Google Drive with speaker diarization, then ask questions and get answers from your audio content.
Last updated
February 4, 2026
Connectors used
Tags
Key Takeaways
- Chat with any audio - Transcribe recordings and ask questions through AI chat
- Speaker diarization - AI identifies who said what in multi-speaker recordings
- Batch processing - Upload a folder of audio files and process them all at once
- No coding required - Visual workflow builder with drag-and-drop setup
- Multiple formats - Supports MP3, WAV, M4A, FLAC, and OGG
What This Workflow Does
This Needle workflow turns audio files stored in Google Drive into a searchable, chat-enabled knowledge base. It transcribes recordings with speaker labels, formats the output as markdown, and adds everything to a Needle collection for AI-powered Q&A.
Use cases:
- Meeting archives - find decisions and action items from past meetings
- Podcast research - ask questions across multiple podcast episodes
- Interview analysis - query customer interviews for insights
- Lecture notes - turn recorded lectures into an AI study resource
- Sales calls - extract objections, questions, and opportunities
How It Works
| Step | What Happens |
|---|---|
| 1. Connect Google Drive folder | The workflow lists all audio files in a specified Google Drive folder |
| 2. Download and queue | Each audio file is downloaded and its URL is prepared for transcription |
| 3. Transcribe with AssemblyAI | Audio is converted to text with speaker diarization, punctuation, and timestamps |
| 4. Format as markdown | Transcripts are converted to clean markdown with speaker labels, duration, word count, and confidence score |
| 5. Add to Needle collection | Formatted transcripts are added to your Needle collection, enabling semantic search and AI chat |
Supported Audio Formats
| Format | Extension |
|---|---|
| MP3 | .mp3 |
| WAV | .wav |
| M4A | .m4a |
| FLAC | .flac |
| OGG | .ogg |
AssemblyAI Transcription Features
| Feature | What It Does |
|---|---|
| Speech-to-Text | Converts audio to text |
| Speaker Diarization | Labels who said what |
| Punctuation | Adds periods and commas |
| Timestamps | Marks time positions in the audio |
Requirements
| Tool | Cost | Purpose |
|---|---|---|
| Needle Account | Free | Workflow + RAG |
| Google Drive | Free | Audio storage |
| AssemblyAI | Pay-per-minute | Transcription |
Setup Instructions
- Add the workflow template to Needle
- Click the "List Files" node and connect your Google Drive account
- Paste your Google Drive folder URL in the instructions
- Sign up at AssemblyAI and get your API key
- Click the "Transcribe" node and connect with your AssemblyAI API key
- Click the "Add Files to Collection" node and choose your target Needle collection (or create a new one)
Customization
| What You Can Change | How |
|---|---|
| Audio source folder | Update the Google Drive folder URL in the "List Files" node |
| Target collection | Select a different Needle collection in the "Add Files to Collection" node |
| Audio formats to process | Adjust the file filter in the workflow to include or exclude specific formats |
| Transcript formatting | Modify the markdown formatting step to change the output structure |
Tips for Better Results
| Tip | Why |
|---|---|
| Use clear recordings | Better audio quality produces more accurate transcriptions |
| Name files descriptively | Descriptive filenames make it easier to identify transcripts later |
| Batch by project | Create separate collections for different topics or projects |
| Review first transcript | Check accuracy before processing large batches |
FAQ
Q: Does it work with multiple languages? A: AssemblyAI supports 100+ languages. The workflow uses automatic language detection.
Q: Can I transcribe video files? A: Yes, AssemblyAI extracts audio from video files. Supported formats include MP4, MOV, and AVI.
Q: What if a transcription fails? A: The workflow has error handling. Check the AssemblyAI dashboard for details on any failures.
Q: How many speakers can it identify? A: Speaker diarization works best with 2-10 speakers. More speakers may reduce accuracy.
Want to showcase your own workflows?
Become a Needle workflow partner and turn your expertise into recurring revenue.