Data collection
Social media data is collected using automated scrapers that gather publicly available information from each platform. We prioritize respecting platform terms of service and rate limits.
Platforms tracked
TikTok
Video content, likes, comments, shares, views
Posts, likes, comments, followers
Bluesky
Posts, likes, reposts, replies, followers
YouTube
Videos, views, likes, comments
X (Twitter)
Posts, likes, retweets, replies
Posts, reactions, comments, shares
Posts, reactions, comments, followers
Metrics definitions
Total engagement
The sum of all interactions across posts: likes + comments + shares/reposts. Views are tracked separately as they represent reach rather than active engagement.
Engagement rate
Average engagement per post: Total Engagement / Number of Posts. This normalizes performance regardless of posting frequency.
Total posts
The number of posts scraped from each platform. We collect up to the 25 most recent posts per platform per grantee during each scraping cycle.
Followers
The number of followers/subscribers on each platform. This metric is available where the platform's public API provides it (e.g., Bluesky).
Data processing
After collection, data goes through several processing steps:
- Aggregation - Posts from each grantee are combined across platforms
- Normalization - Engagement metrics are standardized across different platform formats
- Time Series - Posts are grouped by week to identify trends
- Ranking - Grantees are ranked by various metrics (engagement, posts, rate)
- Content Analysis - Posts are categorized by type (video, image, link, text)
Update schedule
Weekly updates
Data is refreshed automatically every Monday at 7:00 AM ET via GitHub Actions. The scraper runs for approximately 30-60 minutes to collect fresh data from all platforms.
Manual updates can also be triggered as needed for special analysis.
Limitations
Important considerations
- Sample Size: We collect up to 25 recent posts per platform, which may not represent complete historical performance.
- Engagement Data Availability: Some platforms (Facebook, LinkedIn) have limited public engagement data, resulting in lower or zero engagement counts.
- Missing Accounts: Grantees without public social media handles on tracked platforms will not appear in the data.
- Engagement Timing: Engagement metrics are captured at the time of scraping; viral posts may continue to grow.
- Platform Changes: Social media platforms may change their APIs or terms, affecting data collection.
Technical details
Stack: Python 3.11, Playwright, BeautifulSoup
Frontend: Vanilla JavaScript, Tailwind CSS, Chart.js
Hosting: GitHub Pages
Automation: GitHub Actions (weekly cron job)
Data Format: JSON (dashboard-data.json, grantee-specific files)
The source code for this project is available on GitHub.
Questions or feedback?
If you have questions about the methodology, notice data discrepancies, or would like to request additional features, please reach out to the Center for Cooperative Media.
Contact Us