When to use
- Building content processing workflows
- Implementing dispatcher patterns for routing content types
- Integrating Google Sheets and Drive APIs
- Creating batch processing systems with resume capability
What's included
Modular architecture
Project structure with workflow orchestrators, dispatchers, processor classes, and service integrations.
Dispatcher pattern
Route content to appropriate processors based on URL patterns and content types using Protocol classes.
Google Sheets integration
Complete gspread patterns for reading, writing, batch updates, and finding rows by ID.
Progress tracking
Resume capability with JSON-based state persistence, processed ID tracking, and error logging.
Gemini AI integration
Entity extraction, categorization, and image classification with cost tracking.
Rate limiting
Decorators and custom rate limiters with backoff for API calls.
Project structure
| File | Purpose |
|---|---|
| workflow.py | Main orchestrator |
| dispatcher.py | Content-type router |
| processors/base.py | Abstract base class |
| services/sheets_service.py | Google Sheets integration |
| services/ai_service.py | Gemini API wrapper |
| utils/rate_limiter.py | API rate limiting |
| config.py | Environment configuration |
Common pitfalls
Google Sheets cell limits
Max cell length is 50,000 characters. Truncate long content before writing.
CSV encoding issues
Always use encoding='utf-8-sig' for BOM handling when reading CSV files.
Cache API responses
Use @lru_cache to avoid redundant API calls and reduce costs.
Installation
# Clone the repository
git clone https://github.com/jamditis/claude-skills-journalism.git
# Copy the skill to your Claude config
cp -r claude-skills-journalism/python-pipeline ~/.claude/skills/
Or download just this skill from the GitHub repository.
Related skills
Build data pipelines that don't break
Modular architecture, rate limiting, and progress tracking in one skill.
View on GitHub