When to use
- Building content archives from multiple sources
- Implementing AI-powered categorization and tagging
- Extracting entities and relationships for knowledge graphs
- Integrating OCR, web scraping, and social media sources
- Generating accessible PDFs for archival preservation
What's included
Multi-source integration
OCR pipeline for newspapers, web scraping for articles, social media transcripts. Unified schema with 35+ fields.
AI categorization
Taxonomy-based classification with Gemini API. Thematic categories, key concepts, tags, eras, and scope types.
Entity extraction
Extract Person, Organization, Work, Concept, Event, Location entities. Build relationship graphs with deduplication.
PDF generation
WCAG 2.1 accessible PDFs with ReportLab. Metadata, summaries, and full text for archival preservation.
Data validation
Required/critical/optional field validation. Hallucination detection for AI responses. Quality scoring.
Frontend export
JSON/CSV exports for frontend consumption. Entity and relationship data for visualization and search.
Entity types
| Type | ID prefix |
|---|---|
| Person | P-0001 |
| Organization | O-0001 |
| Work | W-0001 |
| Concept | C-0001 |
| Event | E-0001 |
| Location | L-0001 |
Relationship types
Content relationships
- Mentions
- Criticizes
- Cites
- Discusses
- Expands On
- Supports
Historical relationships
- Founded By
- Pioneered
- Inspired By
Structural relationships
- Affiliated With
- Published In
- Originated By
- Occurred At
- Owns / Owned By
Record ID prefixes
| Source | Prefix |
|---|---|
| New York Times | NYT |
| Columbia Journalism Review | CJR |
| PressThink | PT |
| TW | |
| YouTube | YT |
| Newspaper (OCR) | NEWS |
Installation
# Clone the repository
git clone https://github.com/jamditis/claude-skills-journalism.git
# Copy the skill to your Claude config
cp -r claude-skills-journalism/digital-archive ~/.claude/skills/
Or download just this skill from the GitHub repository.
Related skills
Build archives that scale
Multi-source integration, AI categorization, entity extraction, and knowledge graph patterns.
View on GitHub