SCRAPEFRUIT
A Python web application for web scraping with a visual interface. Cascade scraping, anti-bot bypass, and local LLM integration.
01 // FEATURES
CORE CAPABILITIESCascade scraping
Multi-method fallback system. HTTP → Playwright → Puppeteer → Agent-browser → Browser-use. Auto-detects blocks and escalates.
Anti-bot bypass
Playwright-stealth integration for handling Cloudflare, CAPTCHAs, and rate limiting. User agent rotation included.
Poison pill detection
Automatic detection of paywalls, rate limiting, anti-bot patterns, dead links, and login walls. Never scrape garbage.
Local LLM integration
Free local inference via Ollama. Summarization, entity extraction, classification. No API costs.
Video transcription
Extract and transcribe videos from YouTube, Twitter/X, TikTok, and 1000+ platforms via yt-dlp + Whisper.
Vision/OCR fallback
When DOM extraction fails, automatically capture screenshots and use Tesseract OCR to extract text.
02 // CASCADE_SYSTEM
FALLBACK STRATEGY| Method | Speed | JS support | Use case |
|---|---|---|---|
| HTTP | Fastest | No | Static pages, APIs |
| Playwright | Medium | Yes | JavaScript-heavy sites, stealth mode |
| Puppeteer | Medium | Yes | Alternative browser fingerprint |
| Agent-browser | Slower | Yes | AI-optimized with accessibility tree |
| Browser-use | Slowest | Yes | LLM-controlled automation |
| Video | Varies | N/A | YouTube, Twitter/X, TikTok, 1000+ sites |
- > Blocked status codes (403, 429, 503)
- > Anti-bot detection patterns (Cloudflare, CAPTCHA)
- > Empty or minimal content (<500 chars)
- > JavaScript-heavy SPA markers
03 // QUICK_START
GET RUNNING# Clone and setup
git clone https://github.com/jamditis/scrapefruit.git
cd scrapefruit
# Create virtual environment
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
playwright install
# Configure and run
cp .env.example .env
python main.py
- Python 3.11+
- Chromium (via Playwright)
- Tesseract OCR (optional)
- Ollama for local LLM
- yt-dlp + faster-whisper for video
- Google Sheets credentials for export