Workflow

Scrape and Summarize Websites

Automatically scrape webpages from a Google Sheets URL list, use AI to extract structured summaries with key points, quotes, and links, then write results to Google Docs for content research and briefing.

Needle Team

Last updated

October 1, 2025

Connectors used

Google Sheets
google_docs

Tags

Web Scraping ToolsAI Content SummarizationData ExtractionContent Analysis

Key Takeaways

  • Scheduled daily scraping - Runs automatically every day at 9:00 AM UTC to process new URLs from your spreadsheet
  • Structured AI summaries - GPT-4.1 extracts title, key points, quotes, links, and a 2-4 sentence summary for each page
  • Google Sheets to Google Docs pipeline - Reads URLs from a spreadsheet and writes formatted summaries to a Google Doc
  • Automatic data parsing - Code nodes handle header mapping and array flattening so the output is clean and readable

What This Workflow Does

This Needle workflow reads a list of URLs from a Google Sheets spreadsheet, scrapes each webpage using the built-in Browse Web tool, and then sends the content to GPT-4.1 for structured summarization. The AI extracts the title, key points, notable quotes, important outbound links, and a short summary for each page. The results are cleaned up by code and transform nodes, then appended to a Google Doc as formatted output.

Use cases:

  • Compiling research summaries from a list of articles or blog posts
  • Monitoring competitor content by scraping and summarizing their pages regularly
  • Building daily content digests or briefing documents from multiple web sources

How It Works

StepWhat Happens
1. Scheduled TriggerFires daily at 9:00 AM UTC (configurable cron schedule)
2. Google Sheets: Get ValuesReads all rows from your URL spreadsheet, including headers
3. Code (JavaScript)Parses the raw sheet data into objects using the first row as column headers
4. Browse WebFetches the full HTML content of each URL
5. AI Agent (GPT-4.1)Produces a structured JSON summary with URL, Title, Key Points, Quotes, Links, and Summary
6. TransformStrips any markdown code fences from the AI output and parses the JSON
7. Code (JavaScript)Flattens arrays (Key Points, Quotes, Links) into pipe-separated strings
8. Google Docs: Append TextWrites the formatted summaries to your Google Doc

Workflow Nodes

NodeRole
Scheduled TriggerRuns the workflow daily at 9:00 AM UTC
Google Sheets: Get Values in RangeReads all data from the URL spreadsheet including headers
Code (JavaScript)Maps raw rows into structured objects using headers as keys
Browse WebFetches webpage content for each URL
AI Agent (GPT-4.1)Extracts structured JSON with title, key points, quotes, links, and summary
TransformParses the AI's JSON output, removing any code fence formatting
Code (JavaScript)Flattens array fields into pipe-separated strings for clean output
Google Docs: Append TextAppends the final formatted summaries to your Google Doc

Setup Instructions

  1. Add the "Scrape and Summarize Websites" workflow template to your Needle workspace
  2. Connect your Google Sheets account and create a spreadsheet with a "URL" column header in row 1 and URLs in the rows below
  3. Connect your Google Docs account and set the document URL where summaries should be written
  4. Update the Google Sheets and Google Docs URLs in the respective nodes to point to your own documents

Customization

What You Can ChangeHow
Schedule frequencyEdit the Scheduled Trigger's cron expression (default is 0 9 * * * for daily at 9 AM UTC)
TimezoneChange the timezone setting in the Scheduled Trigger node
Summary structureModify the AI Agent's prompt to request different fields or a different output format
AI modelChange the model in the AI Agent node (default is GPT-4.1)
Output destinationReplace the Google Docs node with a different output tool, like Google Sheets or Slack
URL spreadsheetUpdate the Google Sheets URL in the Get Values node to point to your own sheet

FAQ

Q: What format does the input spreadsheet need? A: The first row should contain column headers, with at least a "URL" column. Put your URLs starting from row 2. The Code node automatically maps headers to values.

Q: How does the AI structure its output? A: The AI is prompted to return a JSON object with six fields: URL, Title, Key Points (array of 3-7 bullets), Quotes (array of 0-3 quotes), Links (array of 0-5 outbound links), and Summary (2-4 sentences).

Q: Can I run this manually instead of on a schedule? A: Yes. Replace the Scheduled Trigger with a Manual Trigger if you want to run it on demand.

Q: What happens if a URL is unreachable? A: The Browse Web node will attempt to fetch each URL. If a page is unreachable, that item may fail or return empty content, and the AI will process whatever content it receives.

Want to showcase your own workflows?

Become a Needle workflow partner and turn your expertise into recurring revenue.

Try Needle today

Streamline AI productivity at your company today

Join thousands of people who have transformed their workflows.

Agentic workflowsAutomations, meet AI agents
AI SearchAll your data, searchable
Chat widgetsDrop-in widget for your website
Developer APIMake your app talk to Needle
    Needle LogoNeedle
    Like many websites, we use cookies to enhance your experience, analyze site traffic and deliver personalized content while you are here. By clicking "Accept", you are giving us your consent to use cookies in this way. Read our more on our cookie policy .