Development skill

Web scraping

Reliable, ethical web scraping with fallback strategies, anti-bot handling, and social media extraction.

When to use

What's included

Scraping cascade

Three-tier fallback: Trafilatura (fast) to Requests (HTTP) to Playwright (JavaScript rendering with stealth).

Poison pill detection

Detect paywalls, CAPTCHAs, rate limits, Cloudflare, and login walls with pattern matching.

Undocumented APIs

Find and use hidden APIs via browser dev tools, with examples for autocomplete endpoints.

Social media tools

yt-dlp for YouTube/TikTok, instaloader for Instagram, with metadata extraction and download patterns.

Scraping cascade architecture

Try multiple extraction strategies with automatic fallback:

Fast

Trafilatura

Lightweight extraction for standard articles. Best for news sites and blogs.

Medium

Requests + BeautifulSoup

HTTP requests with rotating user agents. Good for static content.

Heavy

Playwright with stealth

Full JavaScript rendering with anti-bot bypass. For SPAs and protected sites.

Poison pill types

Type Detection patterns
Paywall "subscribe to continue", "you've reached your limit"
CAPTCHA "verify you are human", "robot verification"
Rate limit "too many requests", HTTP 429
Cloudflare "checking your browser", "ddos protection"
Login required "sign in to continue", "create an account"

Installation

# Clone the repository

git clone https://github.com/jamditis/claude-skills-journalism.git

# Copy the skill to your Claude config

cp -r claude-skills-journalism/web-scraping ~/.claude/skills/

Or download just this skill from the GitHub repository.

Related skills

Extract what you need, ethically

Cascade architecture, poison pill detection, and social media tools in one skill.

View on GitHub