Extract Data From PDF
Automate the conversion of unstructured documents into organized, searchable data records in Google Sheets, streamlining data management and eliminating manual entry.
Last updated
October 1, 2025
Connectors used
About
Automate the conversion of unstructured documents into organized, searchable data records in Google Sheets, streamlining data management and eliminating manual entry.
Tags
Tired of manually extracting data from documents? This tutorial shows how to build a document data extraction workflow in Needle that converts unstructured document content into structured Google Sheets data using AI.
Overview
The workflow automatically processes documents from a Needle collection, extracts specific information by asking custom questions to AI, and writes the structured results to Google Sheets—turning document chaos into organized data in minutes.
Key Actions
- Collection Setup - Upload PDFs (or other documents) to a Needle collection
- Batch Processing - Use a loop + list-files pagination to iterate through the collection
- Read Contents - Fetch each file’s text/chunks with “Get file contents”
- AI Extraction - Use AI to extract answers e.g. title + summary
- Google Sheets Export - Write the extracted data to Google Sheets
What You'll Need
- Needle Collection – Create a collection at needle.app/dashboard/collections and upload your documents Tip: Supports PDFs, Word docs, and other document formats
- Custom Questions – Define what information you want to extract (dates, amounts, names, codes, etc.)
- Google Sheets – Link your Google Sheets account and create a sheet with the columns you want to fill
- Google Sheets Connector – Connect Google Sheets so the AI node can use tools like upsert/add rows
How It Works
-
Upload documents to a Needle collection.
-
Trigger the workflow manually.
-
Loop + List files (pagination):
- The workflow uses a loop to paginate through the collection.
- “List files” is called with an offset expression so it can fetch the next batch.
-
Transform (flatten):
- The transform node flattens the list-files output into a single list of files so downstream nodes can iterate file-by-file.
-
Get file contents:
- For each file, the workflow fetches its contents from the collection.
-
AI extraction:
- One AI node can generate a document title and a short summary.
- Another AI node (with Google Sheets tools) can take the file + extracted info and write to Google Sheets.
-
Google Sheets write:
- Configure the Google Sheets URL and column layout in the AI node instructions.
- The AI uses Google Sheets tools (upsert/add/update) to write data into your sheet.
Customization Tips
- Point to your collection: Replace the
collectionIdused by “List files” and “Get file contents”. - Adjust pagination: Update the loop condition / offset logic if you want smaller or larger batches.
- Change what you extract: Edit the AI prompts (title/summary and extraction questions) to match your document types.
- Match your sheet columns: Update the Google Sheets instructions so the AI writes into the exact columns you want.
Wrap-up
With this Needle workflow, you can automate the extraction of data from hundreds of documents—perfect for:
- Medical records processing (diagnosis codes, dates, patient info)
- Invoice processing (amounts, vendors, dates)
- Contract analysis (parties, terms, dates)
- Form processing (any structured data extraction)
The workflow eliminates manual transcription and reduces processing time from hours to minutes.
Want to showcase your own workflows?
Become a Needle workflow partner and turn your expertise into recurring revenue.