Scrape and Summarize Websites
Automatically scrape webpages from a Google Sheets URL list, use AI to extract structured summaries with key points, quotes, and links, then write results to Google Docs for content research and briefing.
Key Takeaways
- Scheduled daily scraping - Runs automatically every day at 9:00 AM UTC to process new URLs from your spreadsheet
- Structured AI summaries - GPT-4.1 extracts title, key points, quotes, links, and a 2-4 sentence summary for each page
- Google Sheets to Google Docs pipeline - Reads URLs from a spreadsheet and writes formatted summaries to a Google Doc
- Automatic data parsing - Code nodes handle header mapping and array flattening so the output is clean and readable
What This Workflow Does
This Needle workflow reads a list of URLs from a Google Sheets spreadsheet, scrapes each webpage using the built-in Browse Web tool, and then sends the content to GPT-4.1 for structured summarization. The AI extracts the title, key points, notable quotes, important outbound links, and a short summary for each page. The results are cleaned up by code and transform nodes, then appended to a Google Doc as formatted output.
Use cases:
- Compiling research summaries from a list of articles or blog posts
- Monitoring competitor content by scraping and summarizing their pages regularly
- Building daily content digests or briefing documents from multiple web sources
How It Works
| Step | What Happens |
|---|---|
| 1. Scheduled Trigger | Fires daily at 9:00 AM UTC (configurable cron schedule) |
| 2. Google Sheets: Get Values | Reads all rows from your URL spreadsheet, including headers |
| 3. Code (JavaScript) | Parses the raw sheet data into objects using the first row as column headers |
| 4. Browse Web | Fetches the full HTML content of each URL |
| 5. AI Agent (GPT-4.1) | Produces a structured JSON summary with URL, Title, Key Points, Quotes, Links, and Summary |
| 6. Transform | Strips any markdown code fences from the AI output and parses the JSON |
| 7. Code (JavaScript) | Flattens arrays (Key Points, Quotes, Links) into pipe-separated strings |
| 8. Google Docs: Append Text | Writes the formatted summaries to your Google Doc |
Workflow Nodes
| Node | Role |
|---|---|
| Scheduled Trigger | Runs the workflow daily at 9:00 AM UTC |
| Google Sheets: Get Values in Range | Reads all data from the URL spreadsheet including headers |
| Code (JavaScript) | Maps raw rows into structured objects using headers as keys |
| Browse Web | Fetches webpage content for each URL |
| AI Agent (GPT-4.1) | Extracts structured JSON with title, key points, quotes, links, and summary |
| Transform | Parses the AI's JSON output, removing any code fence formatting |
| Code (JavaScript) | Flattens array fields into pipe-separated strings for clean output |
| Google Docs: Append Text | Appends the final formatted summaries to your Google Doc |
Setup Instructions
- Add the "Scrape and Summarize Websites" workflow template to your Needle workspace
- Connect your Google Sheets account and create a spreadsheet with a "URL" column header in row 1 and URLs in the rows below
- Connect your Google Docs account and set the document URL where summaries should be written
- Update the Google Sheets and Google Docs URLs in the respective nodes to point to your own documents
Customization
| What You Can Change | How |
|---|---|
| Schedule frequency | Edit the Scheduled Trigger's cron expression (default is 0 9 * * * for daily at 9 AM UTC) |
| Timezone | Change the timezone setting in the Scheduled Trigger node |
| Summary structure | Modify the AI Agent's prompt to request different fields or a different output format |
| AI model | Change the model in the AI Agent node (default is GPT-4.1) |
| Output destination | Replace the Google Docs node with a different output tool, like Google Sheets or Slack |
| URL spreadsheet | Update the Google Sheets URL in the Get Values node to point to your own sheet |
FAQ
Q: What format does the input spreadsheet need? A: The first row should contain column headers, with at least a "URL" column. Put your URLs starting from row 2. The Code node automatically maps headers to values.
Q: How does the AI structure its output? A: The AI is prompted to return a JSON object with six fields: URL, Title, Key Points (array of 3-7 bullets), Quotes (array of 0-3 quotes), Links (array of 0-5 outbound links), and Summary (2-4 sentences).
Q: Can I run this manually instead of on a schedule? A: Yes. Replace the Scheduled Trigger with a Manual Trigger if you want to run it on demand.
Q: What happens if a URL is unreachable? A: The Browse Web node will attempt to fetch each URL. If a page is unreachable, that item may fail or return empty content, and the AI will process whatever content it receives.
Want to showcase your own workflows?
Become a Needle workflow partner and turn your expertise into recurring revenue.