Home Manual About Updates GitHub
HOME / UPDATES

CHANGELOG

What's new in Scrapefruit. Features, fixes, and improvements.

Feature New functionality
Fix Bug fixes
Improvement Enhancements

January 2026

Feature Jan 16

Local LLM integration via Ollama

Process scraped content with local LLMs. Summarize articles, extract entities (people, organizations, dates), and classify content—all without API costs.

  • • Auto-detects Ollama running locally
  • • Falls back to OpenAI/Anthropic if API keys are set
  • • Works with small models like qwen2.5:0.5b (400MB)
Feature Jan 16

Video transcription via yt-dlp + Whisper

Extract and transcribe videos from YouTube, Twitter/X, TikTok, and 1000+ platforms. Get plain text transcripts, SRT subtitles, or word-level timestamps.

  • • 2x speed processing option halves transcription time
  • • Multiple Whisper model sizes (tiny → large)
  • • Automatic timestamp normalization
Improvement Jan 15

Toast notifications and live ETA tracking

Real-time toast notifications for job events. Live ETA tracking shows estimated time remaining based on average URL processing time.

Feature Jan 14

Retry failed URLs and job reset

New "Retry Failed" button to re-attempt only the URLs that failed. "Reset Job" clears all results and starts fresh without recreating the job.

December 2025

Feature Dec 20

HTML auto-analyzer for extraction rules

Automatically detect extraction rules from sample HTML. Works best on modern sites with semantic markup and Open Graph tags.

Improvement Dec 15

Activity log export

Export the activity log as JSON for debugging and record-keeping. Includes all events with timestamps.

Feature Dec 10

Fetch samples feature

Test your extraction rules on sample URLs before running a full job. See what data you'll get without committing to a full scrape.

Fix Dec 5

Playwright stealth mode improvements

Better handling of Cloudflare challenges. Improved user agent rotation and fingerprint randomization.

November 2025

Feature Nov 25

Cascade scraping engine

The core cascade system: automatically escalate through scraping methods when one fails. HTTP → Playwright → Puppeteer → Agent-browser → Browser-use.

  • • Detects blocked status codes (403, 429, 503)
  • • Recognizes anti-bot patterns (Cloudflare, CAPTCHA)
  • • Per-job cascade configuration
Feature Nov 20

Poison pill detection

Automatic detection of paywalls, rate limiting, anti-bot walls, dead links, and login requirements. Never waste time scraping garbage content.

Feature Nov 15

Vision/OCR extraction fallback

When DOM extraction fails, automatically capture screenshots and use Tesseract OCR to extract text. Useful for image-heavy or anti-scraping sites.

Feature Nov 10

Google Sheets export

Export scraping results directly to Google Sheets for collaboration and analysis. Requires service account credentials.

Feature Nov 1

Initial release

First public release of Scrapefruit. Visual interface, basic HTTP and Playwright scraping, CSS/XPath extraction, SQLite storage.

Stay updated

Watch the repository on GitHub to get notified of new releases.

Watch on GitHub