CHANGELOG
What's new in Scrapefruit. Features, fixes, and improvements.
January 2026
Local LLM integration via Ollama
Process scraped content with local LLMs. Summarize articles, extract entities (people, organizations, dates), and classify content—all without API costs.
- • Auto-detects Ollama running locally
- • Falls back to OpenAI/Anthropic if API keys are set
- • Works with small models like qwen2.5:0.5b (400MB)
Video transcription via yt-dlp + Whisper
Extract and transcribe videos from YouTube, Twitter/X, TikTok, and 1000+ platforms. Get plain text transcripts, SRT subtitles, or word-level timestamps.
- • 2x speed processing option halves transcription time
- • Multiple Whisper model sizes (tiny → large)
- • Automatic timestamp normalization
Toast notifications and live ETA tracking
Real-time toast notifications for job events. Live ETA tracking shows estimated time remaining based on average URL processing time.
Retry failed URLs and job reset
New "Retry Failed" button to re-attempt only the URLs that failed. "Reset Job" clears all results and starts fresh without recreating the job.
December 2025
HTML auto-analyzer for extraction rules
Automatically detect extraction rules from sample HTML. Works best on modern sites with semantic markup and Open Graph tags.
Activity log export
Export the activity log as JSON for debugging and record-keeping. Includes all events with timestamps.
Fetch samples feature
Test your extraction rules on sample URLs before running a full job. See what data you'll get without committing to a full scrape.
Playwright stealth mode improvements
Better handling of Cloudflare challenges. Improved user agent rotation and fingerprint randomization.
November 2025
Cascade scraping engine
The core cascade system: automatically escalate through scraping methods when one fails. HTTP → Playwright → Puppeteer → Agent-browser → Browser-use.
- • Detects blocked status codes (403, 429, 503)
- • Recognizes anti-bot patterns (Cloudflare, CAPTCHA)
- • Per-job cascade configuration
Poison pill detection
Automatic detection of paywalls, rate limiting, anti-bot walls, dead links, and login requirements. Never waste time scraping garbage content.
Vision/OCR extraction fallback
When DOM extraction fails, automatically capture screenshots and use Tesseract OCR to extract text. Useful for image-heavy or anti-scraping sites.
Google Sheets export
Export scraping results directly to Google Sheets for collaboration and analysis. Requires service account credentials.
Initial release
First public release of Scrapefruit. Visual interface, basic HTTP and Playwright scraping, CSS/XPath extraction, SQLite storage.