Development skill

Python pipeline

Python data processing pipelines with modular architecture for content processing workflows, dispatcher patterns, and batch processing systems.

When to use

What's included

Modular architecture

Project structure with workflow orchestrators, dispatchers, processor classes, and service integrations.

Dispatcher pattern

Route content to appropriate processors based on URL patterns and content types using Protocol classes.

Google Sheets integration

Complete gspread patterns for reading, writing, batch updates, and finding rows by ID.

Progress tracking

Resume capability with JSON-based state persistence, processed ID tracking, and error logging.

Gemini AI integration

Entity extraction, categorization, and image classification with cost tracking.

Rate limiting

Decorators and custom rate limiters with backoff for API calls.

Project structure

File Purpose
workflow.py Main orchestrator
dispatcher.py Content-type router
processors/base.py Abstract base class
services/sheets_service.py Google Sheets integration
services/ai_service.py Gemini API wrapper
utils/rate_limiter.py API rate limiting
config.py Environment configuration

Common pitfalls

Watch out

Google Sheets cell limits

Max cell length is 50,000 characters. Truncate long content before writing.

Watch out

CSV encoding issues

Always use encoding='utf-8-sig' for BOM handling when reading CSV files.

Best practice

Cache API responses

Use @lru_cache to avoid redundant API calls and reduce costs.

Installation

# Clone the repository

git clone https://github.com/jamditis/claude-skills-journalism.git

# Copy the skill to your Claude config

cp -r claude-skills-journalism/python-pipeline ~/.claude/skills/

Or download just this skill from the GitHub repository.

Related skills

Build data pipelines that don't break

Modular architecture, rate limiting, and progress tracking in one skill.

View on GitHub