Methodology

How we collect, process, and present social media data for NJCIC grantees.

Data collection

Social media data is collected using automated scrapers that gather publicly available information from each platform. We prioritize respecting platform terms of service and rate limits.

Platforms tracked

TikTok

Video content, likes, comments, shares, views

Instagram

Posts, likes, comments, followers

Bluesky

Posts, likes, reposts, replies, followers

YouTube

Videos, views, likes, comments

X (Twitter)

Posts, likes, retweets, replies

Facebook

Posts, reactions, comments, shares

LinkedIn

Posts, reactions, comments, followers

Metrics definitions

Total engagement

The sum of all interactions across posts: likes + comments + shares/reposts. Views are tracked separately as they represent reach rather than active engagement.

Engagement rate

Average engagement per post: Total Engagement / Number of Posts. This normalizes performance regardless of posting frequency.

Total posts

The number of posts scraped from each platform. We collect up to the 25 most recent posts per platform per grantee during each scraping cycle.

Followers

The number of followers/subscribers on each platform. This metric is available where the platform's public API provides it (e.g., Bluesky).

Data processing

After collection, data goes through several processing steps:

  1. Aggregation - Posts from each grantee are combined across platforms
  2. Normalization - Engagement metrics are standardized across different platform formats
  3. Time Series - Posts are grouped by week to identify trends
  4. Ranking - Grantees are ranked by various metrics (engagement, posts, rate)
  5. Content Analysis - Posts are categorized by type (video, image, link, text)

Update schedule

Weekly updates

Data is refreshed automatically every Monday at 7:00 AM ET via GitHub Actions. The scraper runs for approximately 30-60 minutes to collect fresh data from all platforms.

Manual updates can also be triggered as needed for special analysis.

Limitations

Important considerations

  • Sample Size: We collect up to 25 recent posts per platform, which may not represent complete historical performance.
  • Engagement Data Availability: Some platforms (Facebook, LinkedIn) have limited public engagement data, resulting in lower or zero engagement counts.
  • Missing Accounts: Grantees without public social media handles on tracked platforms will not appear in the data.
  • Engagement Timing: Engagement metrics are captured at the time of scraping; viral posts may continue to grow.
  • Platform Changes: Social media platforms may change their APIs or terms, affecting data collection.

Technical details

Stack: Python 3.11, Playwright, BeautifulSoup

Frontend: Vanilla JavaScript, Tailwind CSS, Chart.js

Hosting: GitHub Pages

Automation: GitHub Actions (weekly cron job)

Data Format: JSON (dashboard-data.json, grantee-specific files)

The source code for this project is available on GitHub.

Questions or feedback?

If you have questions about the methodology, notice data discrepancies, or would like to request additional features, please reach out to the Center for Cooperative Media.

Contact Us