Scrape and Summarize Websites
Automatically scrape webpages from a Google Sheets URL list, use AI to extract structured summaries with key points, quotes, and links, then write results to Google Docs for content research and briefing.
Last updated
October 1, 2025
Connectors used
Tags
Key Takeaways
- Scheduled daily scraping - Runs automatically every day at 9:00 AM UTC to process new URLs from your spreadsheet
- Structured AI summaries - GPT-4.1 extracts title, key points, quotes, links, and a 2-4 sentence summary for each page
- Google Sheets to Google Docs pipeline - Reads URLs from a spreadsheet and writes formatted summaries to a Google Doc
- Automatic data parsing - Code nodes handle header mapping and array flattening so the output is clean and readable
What This Workflow Does
This Needle workflow reads a list of URLs from a Google Sheets spreadsheet, scrapes each webpage using the built-in Browse Web tool, and then sends the content to GPT-4.1 for structured summarization. The AI extracts the title, key points, notable quotes, important outbound links, and a short summary for each page. The results are cleaned up by code and transform nodes, then appended to a Google Doc as formatted output.
Use cases:
- Compiling research summaries from a list of articles or blog posts
- Monitoring competitor content by scraping and summarizing their pages regularly
- Building daily content digests or briefing documents from multiple web sources
How It Works
| Step | What Happens |
|---|---|
| 1. Scheduled Trigger | Fires daily at 9:00 AM UTC (configurable cron schedule) |
| 2. Google Sheets: Get Values | Reads all rows from your URL spreadsheet, including headers |
| 3. Code (JavaScript) | Parses the raw sheet data into objects using the first row as column headers |
| 4. Browse Web | Fetches the full HTML content of each URL |
| 5. AI Agent (GPT-4.1) | Produces a structured JSON summary with URL, Title, Key Points, Quotes, Links, and Summary |
| 6. Transform | Strips any markdown code fences from the AI output and parses the JSON |
| 7. Code (JavaScript) | Flattens arrays (Key Points, Quotes, Links) into pipe-separated strings |
| 8. Google Docs: Append Text | Writes the formatted summaries to your Google Doc |
Workflow Nodes
| Node | Role |
|---|---|
| Scheduled Trigger | Runs the workflow daily at 9:00 AM UTC |
| Google Sheets: Get Values in Range | Reads all data from the URL spreadsheet including headers |
| Code (JavaScript) | Maps raw rows into structured objects using headers as keys |
| Browse Web | Fetches webpage content for each URL |
| AI Agent (GPT-4.1) | Extracts structured JSON with title, key points, quotes, links, and summary |
| Transform | Parses the AI's JSON output, removing any code fence formatting |
| Code (JavaScript) | Flattens array fields into pipe-separated strings for clean output |
| Google Docs: Append Text | Appends the final formatted summaries to your Google Doc |
Setup Instructions
- Add the "Scrape and Summarize Websites" workflow template to your Needle workspace
- Connect your Google Sheets account and create a spreadsheet with a "URL" column header in row 1 and URLs in the rows below
- Connect your Google Docs account and set the document URL where summaries should be written
- Update the Google Sheets and Google Docs URLs in the respective nodes to point to your own documents
Customization
| What You Can Change | How |
|---|---|
| Schedule frequency | Edit the Scheduled Trigger's cron expression (default is 0 9 * * * for daily at 9 AM UTC) |
| Timezone | Change the timezone setting in the Scheduled Trigger node |
| Summary structure | Modify the AI Agent's prompt to request different fields or a different output format |
| AI model | Change the model in the AI Agent node (default is GPT-4.1) |
| Output destination | Replace the Google Docs node with a different output tool, like Google Sheets or Slack |
| URL spreadsheet | Update the Google Sheets URL in the Get Values node to point to your own sheet |
FAQ
Q: What format does the input spreadsheet need? A: The first row should contain column headers, with at least a "URL" column. Put your URLs starting from row 2. The Code node automatically maps headers to values.
Q: How does the AI structure its output? A: The AI is prompted to return a JSON object with six fields: URL, Title, Key Points (array of 3-7 bullets), Quotes (array of 0-3 quotes), Links (array of 0-5 outbound links), and Summary (2-4 sentences).
Q: Can I run this manually instead of on a schedule? A: Yes. Replace the Scheduled Trigger with a Manual Trigger if you want to run it on demand.
Q: What happens if a URL is unreachable? A: The Browse Web node will attempt to fetch each URL. If a page is unreachable, that item may fail or return empty content, and the AI will process whatever content it receives.
Want to showcase your own workflows?
Become a Needle workflow partner and turn your expertise into recurring revenue.