

Clean markdown or JSON from any URL, with 90% fewer tokens than raw HTML. A drop-in Firecrawl replacement, no headless browser.
Production-ready on real-world sites like these
A live page is mostly noise your agent still pays for: nav, ads, cookie walls, footer soup. Webclaw keeps the content and drops the rest. Same content, a fraction of the tokens.

Static pages come back instantly. JS-heavy sites only render when they have to, with nothing to configure.
Swap one base URL, keep your SDK. /v2 matches Firecrawl's shape and response exactly, no rewrite.
Challenge pages, CAPTCHAs and fingerprinting handled transparently. No cookies, no per-site config.
Markdown, JSON, text, LLM-optimized. Schema & prompt extraction, summaries, diffing, across 14 endpoints.
An MCP server with 12 tools for Claude, Cursor, Codex & any client. REST for search, batch and crawl.
A 9-step pipeline strips the nav, ads and boilerplate, keeping the real content fully intact.
Give it a goal, get structured data back. The agent reasons over the page, clicks and navigates to it.
Embedded JSON & server-rendered payloads recovered even from an empty DOM. Auto-detects PDF, DOCX, XLSX.
One credit pool covers every endpoint. Start in minutes, or self-host the open-source stack with no limits.
Cancel anytime·no lock-in·self-host the open-source core for free
Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.
Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.
The open-source version (AGPL-3.0) runs locally on your own hardware with no limits, so you can try the full engine for free without a card. The managed API is paid, starting at $19/mo for Starter, and you can cancel any time from the billing portal.
Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.
Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.
Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.
Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.
Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.
Cancel anytime. Migrate from Firecrawl in 60 seconds with the compatibility layer.
Cookies & analytics
We'd like to use analytics to understand how this site is used. Nothing loads or fires until you agree. See our privacy policy for the full list of processors.