Web scraping - Claude skills for journalism

When to use

Three-tier fallback: Trafilatura (fast) to Requests (HTTP) to Playwright (JavaScript rendering with stealth).

Detect paywalls, CAPTCHAs, rate limits, Cloudflare, and login walls with pattern matching.

Find and use hidden APIs via browser dev tools, with examples for autocomplete endpoints.

yt-dlp for YouTube/TikTok, instaloader for Instagram, with metadata extraction and download patterns.

Try multiple extraction strategies with automatic fallback:

Fast

Lightweight extraction for standard articles. Best for news sites and blogs.

Medium

HTTP requests with rotating user agents. Good for static content.

Heavy

Full JavaScript rendering with anti-bot bypass. For SPAs and protected sites.

Type	Detection patterns
Paywall	"subscribe to continue", "you've reached your limit"
CAPTCHA	"verify you are human", "robot verification"
Rate limit	"too many requests", HTTP 429
Cloudflare	"checking your browser", "ddos protection"
Login required	"sign in to continue", "create an account"

# Clone the repository

git clone https://github.com/jamditis/claude-skills-journalism.git

# Copy the skill to your Claude config

cp -r claude-skills-journalism/web-scraping ~/.claude/skills/

Or download just this skill from the GitHub repository.

Web archiving Python pipeline Social media intelligence

Cascade architecture, poison pill detection, and social media tools in one skill.