Workflow

Scrape and Summarize News Website

Automate news article scraping from websites without RSS feeds. The workflow browses news sites, extracts article links, fetches full content, uses AI to summarize each article with key points and sentiment analysis, then exports structured data to Google Sheets for easy tracking and analysis.

Needle Team

Last updated

October 1, 2025

Connectors used

Google Sheets

Tags

News AggregationContent ScrapingMedia MonitoringAI News Analysis

Key Takeaways

  • No RSS required - Scrapes news directly from website HTML, so you can monitor sites that do not offer RSS feeds
  • AI-powered summaries - Each article is summarized by GPT-4.1 with key points, category, and sentiment analysis
  • Structured Google Sheets output - All scraped articles are exported to a spreadsheet with nine columns for easy filtering and review
  • Runs on a schedule - A daily trigger (default 9 AM UTC) keeps your news tracker up to date automatically

What This Workflow Does

This workflow visits a list of news websites you configure, scrapes the latest article links, fetches the full content of each article, and uses AI to produce a structured summary. The results are written to Google Sheets with fields like title, source, URL, summary, key points, category, and sentiment. It is built for sites that do not provide RSS feeds, so you can track almost any news source on the web.

Use cases:

  • Monitoring industry or competitor news from sites without RSS
  • Building a daily or weekly news digest for your team
  • Tracking sentiment and categories across multiple news sources

How It Works

StepWhat Happens
1. Scheduled TriggerThe workflow fires on a cron schedule (default: daily at 9 AM UTC)
2. News Sites ConfigA JavaScript code node defines the list of target news sites with their URLs and CSS selectors
3. Browse News PageEach site's homepage or news section is fetched using the Browse Web tool
4. Extract ArticlesA code node parses the HTML and extracts up to 10 article links per site
5. Browse ArticleThe full content of each article is fetched
6. AI SummarizeGPT-4.1 reads the article and returns a JSON object with title, source, URL, date, summary, key points, category, and sentiment
7. Transform and FlattenThe AI output is parsed and flattened into a row-per-article format for Google Sheets
8. Google Sheets ExportRows are written to Google Sheets, updating existing rows by URL or appending new ones

Workflow Nodes

NodeRole
Scheduled TriggerFires the workflow on a cron schedule (default daily 9 AM UTC)
News Sites ConfigJavaScript code node that defines target sites, URLs, and CSS selectors
Browse News PageFetches the homepage or news section of each configured site
Extract ArticlesJavaScript code node that parses HTML and extracts article links (up to 10 per site)
Browse ArticleFetches the full content of each individual article
AI SummarizeGPT-4.1 node that produces a structured JSON summary per article
Transform ParseParses the JSON strings returned by the AI node
Flatten for SheetsCode node that maps each summary into a flat row with nine columns
Google Sheets ExportWrites or updates rows in the target Google Sheet

Setup Instructions

  1. Add the "Scrape and Summarize News Website" template to your Needle workspace
  2. Open the News Sites Config code node and add your target news sites with their URLs and CSS selectors for articles, titles, and links
  3. Connect your Google Sheets account and point the export node to the spreadsheet where you want results saved
  4. Adjust the cron schedule on the Scheduled Trigger node if you prefer a different frequency (weekly, hourly, etc.)
  5. Run the workflow manually once to verify results appear in your spreadsheet

Customization

What You Can ChangeHow
Target news sitesEdit the News Sites Config code node to add or remove sites and update CSS selectors
Articles per siteChange the limit in the Extract Articles code node (default is 10)
Schedule frequencyModify the cron expression on the Scheduled Trigger node (e.g., weekly instead of daily)
AI modelSwap the model in the AI Summarize node if you prefer a different provider
Output columnsEdit the Flatten for Sheets code node to add or remove columns in the Google Sheets output
Summary formatAdjust the AI prompt to change what fields are included in each summary

FAQ

Q: What if a news site changes its HTML structure? A: You will need to update the CSS selectors in the News Sites Config code node. Use your browser's developer tools to find the correct selectors for article links and titles.

Q: Can I monitor more than one site at a time? A: Yes. The News Sites Config node accepts an array of site objects, so you can add as many sites as you need.

Q: What columns does the Google Sheet contain? A: The default output has nine columns: Title, Source, URL, Date, Summary, Key Points (pipe-separated), Category, Sentiment, and Scraped At (timestamp).

Q: Does the workflow handle duplicate articles? A: The Google Sheets export node is configured to update existing rows if a row with the same URL already exists, and append new rows otherwise.

Want to showcase your own workflows?

Become a Needle workflow partner and turn your expertise into recurring revenue.

Try Needle today

Streamline AI productivity at your company today

Join thousands of people who have transformed their workflows.

Agentic workflowsAutomations, meet AI agents
AI SearchAll your data, searchable
Chat widgetsDrop-in widget for your website
Developer APIMake your app talk to Needle
    Needle LogoNeedle
    Like many websites, we use cookies to enhance your experience, analyze site traffic and deliver personalized content while you are here. By clicking "Accept", you are giving us your consent to use cookies in this way. Read our more on our cookie policy .