Archive skill

Digital archive

Build production-quality digital archives with AI-powered categorization, entity extraction, and knowledge graph construction.

When to use

What's included

Multi-source integration

OCR pipeline for newspapers, web scraping for articles, social media transcripts. Unified schema with 35+ fields.

AI categorization

Taxonomy-based classification with Gemini API. Thematic categories, key concepts, tags, eras, and scope types.

Entity extraction

Extract Person, Organization, Work, Concept, Event, Location entities. Build relationship graphs with deduplication.

PDF generation

WCAG 2.1 accessible PDFs with ReportLab. Metadata, summaries, and full text for archival preservation.

Data validation

Required/critical/optional field validation. Hallucination detection for AI responses. Quality scoring.

Frontend export

JSON/CSV exports for frontend consumption. Entity and relationship data for visualization and search.

Entity types

Type ID prefix
Person P-0001
Organization O-0001
Work W-0001
Concept C-0001
Event E-0001
Location L-0001

Relationship types

Content relationships

  • Mentions
  • Criticizes
  • Cites
  • Discusses
  • Expands On
  • Supports

Historical relationships

  • Founded By
  • Pioneered
  • Inspired By

Structural relationships

  • Affiliated With
  • Published In
  • Originated By
  • Occurred At
  • Owns / Owned By

Record ID prefixes

Source Prefix
New York Times NYT
Columbia Journalism Review CJR
PressThink PT
Twitter TW
YouTube YT
Newspaper (OCR) NEWS

Installation

# Clone the repository

git clone https://github.com/jamditis/claude-skills-journalism.git

# Copy the skill to your Claude config

cp -r claude-skills-journalism/digital-archive ~/.claude/skills/

Or download just this skill from the GitHub repository.

Related skills

Build archives that scale

Multi-source integration, AI categorization, entity extraction, and knowledge graph patterns.

View on GitHub