Scrape and Summarize News Website
Automate news article scraping from websites without RSS feeds. The workflow browses news sites, extracts article links, fetches full content, uses AI to summarize each article with key points and sentiment analysis, then exports structured data to Google Sheets for easy tracking and analysis.
Last updated
October 1, 2025
Connectors used
Tags
Key Takeaways
- No RSS required - Scrapes news directly from website HTML, so you can monitor sites that do not offer RSS feeds
- AI-powered summaries - Each article is summarized by GPT-4.1 with key points, category, and sentiment analysis
- Structured Google Sheets output - All scraped articles are exported to a spreadsheet with nine columns for easy filtering and review
- Runs on a schedule - A daily trigger (default 9 AM UTC) keeps your news tracker up to date automatically
What This Workflow Does
This workflow visits a list of news websites you configure, scrapes the latest article links, fetches the full content of each article, and uses AI to produce a structured summary. The results are written to Google Sheets with fields like title, source, URL, summary, key points, category, and sentiment. It is built for sites that do not provide RSS feeds, so you can track almost any news source on the web.
Use cases:
- Monitoring industry or competitor news from sites without RSS
- Building a daily or weekly news digest for your team
- Tracking sentiment and categories across multiple news sources
How It Works
| Step | What Happens |
|---|---|
| 1. Scheduled Trigger | The workflow fires on a cron schedule (default: daily at 9 AM UTC) |
| 2. News Sites Config | A JavaScript code node defines the list of target news sites with their URLs and CSS selectors |
| 3. Browse News Page | Each site's homepage or news section is fetched using the Browse Web tool |
| 4. Extract Articles | A code node parses the HTML and extracts up to 10 article links per site |
| 5. Browse Article | The full content of each article is fetched |
| 6. AI Summarize | GPT-4.1 reads the article and returns a JSON object with title, source, URL, date, summary, key points, category, and sentiment |
| 7. Transform and Flatten | The AI output is parsed and flattened into a row-per-article format for Google Sheets |
| 8. Google Sheets Export | Rows are written to Google Sheets, updating existing rows by URL or appending new ones |
Workflow Nodes
| Node | Role |
|---|---|
| Scheduled Trigger | Fires the workflow on a cron schedule (default daily 9 AM UTC) |
| News Sites Config | JavaScript code node that defines target sites, URLs, and CSS selectors |
| Browse News Page | Fetches the homepage or news section of each configured site |
| Extract Articles | JavaScript code node that parses HTML and extracts article links (up to 10 per site) |
| Browse Article | Fetches the full content of each individual article |
| AI Summarize | GPT-4.1 node that produces a structured JSON summary per article |
| Transform Parse | Parses the JSON strings returned by the AI node |
| Flatten for Sheets | Code node that maps each summary into a flat row with nine columns |
| Google Sheets Export | Writes or updates rows in the target Google Sheet |
Setup Instructions
- Add the "Scrape and Summarize News Website" template to your Needle workspace
- Open the News Sites Config code node and add your target news sites with their URLs and CSS selectors for articles, titles, and links
- Connect your Google Sheets account and point the export node to the spreadsheet where you want results saved
- Adjust the cron schedule on the Scheduled Trigger node if you prefer a different frequency (weekly, hourly, etc.)
- Run the workflow manually once to verify results appear in your spreadsheet
Customization
| What You Can Change | How |
|---|---|
| Target news sites | Edit the News Sites Config code node to add or remove sites and update CSS selectors |
| Articles per site | Change the limit in the Extract Articles code node (default is 10) |
| Schedule frequency | Modify the cron expression on the Scheduled Trigger node (e.g., weekly instead of daily) |
| AI model | Swap the model in the AI Summarize node if you prefer a different provider |
| Output columns | Edit the Flatten for Sheets code node to add or remove columns in the Google Sheets output |
| Summary format | Adjust the AI prompt to change what fields are included in each summary |
FAQ
Q: What if a news site changes its HTML structure? A: You will need to update the CSS selectors in the News Sites Config code node. Use your browser's developer tools to find the correct selectors for article links and titles.
Q: Can I monitor more than one site at a time? A: Yes. The News Sites Config node accepts an array of site objects, so you can add as many sites as you need.
Q: What columns does the Google Sheet contain? A: The default output has nine columns: Title, Source, URL, Date, Summary, Key Points (pipe-separated), Category, Sentiment, and Scraped At (timestamp).
Q: Does the workflow handle duplicate articles? A: The Google Sheets export node is configured to update existing rows if a row with the same URL already exists, and append new rows otherwise.
Want to showcase your own workflows?
Become a Needle workflow partner and turn your expertise into recurring revenue.