Python pipeline - Claude skills for journalism

When to use

Project structure with workflow orchestrators, dispatchers, processor classes, and service integrations.

Route content to appropriate processors based on URL patterns and content types using Protocol classes.

Complete gspread patterns for reading, writing, batch updates, and finding rows by ID.

Resume capability with JSON-based state persistence, processed ID tracking, and error logging.

Entity extraction, categorization, and image classification with cost tracking.

Decorators and custom rate limiters with backoff for API calls.

Watch out

Max cell length is 50,000 characters. Truncate long content before writing.

Watch out

Always use encoding='utf-8-sig' for BOM handling when reading CSV files.

Best practice

Use @lru_cache to avoid redundant API calls and reduce costs.

# Clone the repository

git clone https://github.com/jamditis/claude-skills-journalism.git

# Copy the skill to your Claude config

cp -r claude-skills-journalism/python-pipeline ~/.claude/skills/

Or download just this skill from the GitHub repository.

Web scraping Data journalism Digital archive

Modular architecture, rate limiting, and progress tracking in one skill.