Extract Data From PDF

Automate the conversion of unstructured documents into organized, searchable data records in Google Sheets, streamlining data management and eliminating manual entry.
Tired of manually extracting data from documents? This tutorial shows how to build a document data extraction workflow in Needle that converts unstructured document content into structured Google Sheets data using AI.
Overview
The workflow automatically processes documents from a Needle collection, extracts specific information by asking custom questions to AI, and writes the structured results to Google Sheets—turning document chaos into organized data in minutes.
Key Actions
- Collection Setup – Upload documents to a Needle collection
- Batch Processing – Loop through all files in the collection automatically
- AI Question Answering – Ask specific questions to extract targeted information from each document
- Data Structuring – Transform and merge AI responses into structured records
- Google Sheets Export – AI agent writes data to your spreadsheet using configured tools
What You'll Need
- Needle Collection – Create a collection at needle.app/dashboard/collections and upload your documents Tip: Supports PDFs, Word docs, and other document formats
- Custom Questions – Define what information you want to extract (e.g., "What is the ICD code?", "When was the diagnosis created?")
- Google Sheets – Link your Google Sheets account and prepare a target spreadsheet with appropriate columns
- Google Sheets Connector – Configure connector in Needle to enable the AI agent to write data
How It Works
- Upload documents to your Needle collection—these will be automatically indexed and ready to process
- Trigger the workflow manually to start processing all files in the collection
- Loop mechanism automatically paginate through files (processes up to 20 batches of files)
- For each document, multiple AI agents ask specific questions in parallel:
- Each AI node extracts one piece of information
- You customize the questions in the system prompt of each node
- Example questions: "What is the ICD code?", "When was the journey not possible?", "When was the diagnosis created?"
- Transform nodes shape each answer into a structured field
- Merge node combines all extracted data points together
- Code node restructures the data into proper table rows, aligning all fields
- AI agent with Google Sheets tools intelligently writes the data to your spreadsheet:
- Finds the correct sheet and columns
- Upserts rows with extracted data
- Handles multiple rows if processing multiple files
Customization Tips
- Modify the questions: Update the prompt in each Needle AI node to ask different questions relevant to your documents
system - Add more questions: Duplicate AI nodes and add new transform nodes to extract additional fields
- Adjust the loop: Change the loop condition if you have more/fewer files to process
- Configure Google Sheets: Update the AI agent's instructions to specify your exact spreadsheet URL and column structure
Wrap-up
With this Needle workflow, you can automate the extraction of data from hundreds of documents—perfect for:
- Medical records processing (diagnosis codes, dates, patient info)
- Invoice processing (amounts, vendors, dates)
- Contract analysis (parties, terms, dates)
- Form processing (any structured data extraction)
The workflow eliminates manual transcription and reduces processing time from hours to minutes.