Access Protected WordPress Pages & Add to Knowledge Base
Scrape password-protected WordPress pages using browser cookies and automatically add the content to a Needle collection. Great for internal wikis and gated content.
Tags
Key Takeaways
- Scrapes authenticated pages - Access content behind login walls using your browser session cookies
- AI content extraction - Gemini strips navigation, headers, and footers, returning clean markdown
- Adds to your Needle collection - Scraped content is indexed for AI-powered semantic search
- Works with any cookie-authenticated site - WordPress, Drupal, internal wikis, and more
- One page per run - Handles a single page; extend with a loop for multiple pages
What This Workflow Does
This workflow fetches a password-protected web page using your browser's session cookies, extracts the main content with AI, and adds it to a Needle collection for semantic search. You copy a fetch() request from DevTools, and the workflow makes the authenticated request, converts the HTML to clean markdown, and indexes it in your knowledge base.
Use cases:
- Index internal wiki pages for AI-powered search
- Archive gated documentation or member-only content
- Add protected knowledge base articles to your Needle collection
How It Works
| Step | What Happens |
|---|---|
| 1. Manual trigger | You paste a fetch() request copied from your browser's DevTools |
| 2. Parse fetch | Code node extracts the URL, method, headers, and cookies |
| 3. HTTP request | Fetches the protected page using your session authentication |
| 4. AI content extraction | Gemini extracts the main text content and formats it as markdown |
| 5. Add to collection | Converts to a markdown file and adds it to your Needle collection |
Setup Instructions
- Click "Use template" on this page
- Log into the site with the protected content
- Open DevTools (F12) and go to the Network tab
- Navigate to the protected page
- Find the page request (usually the first one), right-click, and copy as fetch
- Paste the fetch() into the Manual Trigger node
- Select your target Needle collection in the last node
- Run the workflow
Customization
| What You Can Change | How |
|---|---|
| Target collection | Select a different Needle collection in the "Add Files" node |
| Content extraction | Edit the AI node prompt to focus on specific parts of the page |
| Multiple pages | Wrap the workflow in a loop with a list of URLs |
| Output format | Change the code node to produce a different file format |
FAQ
Q: Does this work with any website? A: It works with any site that uses cookie-based authentication. Sites using JavaScript-only rendering may need the AI browser tool instead.
Q: How long do session cookies stay valid? A: It depends on the site. Most sessions last between 24 hours and 30 days. Re-copy the fetch() if your session expires.
Q: Can I scrape multiple pages at once? A: The template handles one page per run. You can extend it with a loop node to process a list of URLs.
Q: Is the scraped content searchable immediately? A: Yes, once added to your Needle collection, it is indexed and available for semantic search right away.
Want to showcase your own workflows?
Become a Needle workflow partner and turn your expertise into recurring revenue.
