Anonymous View

The web scraper yourAI agent deserves.

Clean markdown or JSON from any URL, with 90% fewer tokens than raw HTML. A drop-in Firecrawl replacement, no headless browser.

Cancel anytimeSelf-host foreverOpen source core

Production-ready on real-world sites like these

NikeAirbnbEtsyShopifyIKEATargetTripAdvisorZillowIMDbRotten TomatoesYelpGoodreadsWikipediaGitHubProduct HuntY CombinatorStripeSpotifyNikeAirbnbEtsyShopifyIKEATargetTripAdvisorZillowIMDbRotten TomatoesYelpGoodreadsWikipediaGitHubProduct HuntY CombinatorStripeSpotify
See it work

Drag the seam. Watch a page become tokens.

A live page is mostly noise your agent still pays for: nav, ads, cookie walls, footer soup. Webclaw keeps the content and drops the rest. Same content, a fraction of the tokens.

amazon.it/dp/B0GSS4M55Kextract.md
# Apple AirPods Max 2
Wireless Over-Ear · Active Noise Cancelling
Adaptive Audio · Spatial Audio · Translation
Price €529.49 (RRP €579, 9% off)
Rating 4.5 out of 5 stars
Brand Apple
Colour Midnight
Model AirPods Max 2
## Buying options
- New · €529.49
- Used, Good · €503.02
[Add to Basket](amazon.it/…)
drag to convert
raw · 402.1k tokens
1 request · 2.1× smaller
Product

Everything an agent needs. Nothing it doesn’t.

Fast by default, smart when needed.

Static pages come back instantly. JS-heavy sites only render when they have to, with nothing to configure.

A drop-in Firecrawl replacement.

Swap one base URL, keep your SDK. /v2 matches Firecrawl's shape and response exactly, no rewrite.

Best-in-class bot protection.

Challenge pages, CAPTCHAs and fingerprinting handled transparently. No cookies, no per-site config.

Every format, every extraction.

Markdown, JSON, text, LLM-optimized. Schema & prompt extraction, summaries, diffing, across 14 endpoints.

Built for AI agents.

An MCP server with 12 tools for Claude, Cursor, Codex & any client. REST for search, batch and crawl.

90% fewer tokens.

A 9-step pipeline strips the nav, ads and boilerplate, keeping the real content fully intact.

Agentic scraping.

Give it a goal, get structured data back. The agent reasons over the page, clicks and navigates to it.

Deep content recovery.

Embedded JSON & server-rendered payloads recovered even from an empty DOM. Auto-detects PDF, DOCX, XLSX.

0k
pages extracted
0
bot walls bypassed
0
websites scraped
0
github stars
Pricing

Pay for pages, not seats.

One credit pool covers every endpoint. Start in minutes, or self-host the open-source stack with no limits.

Save 20%
Starter
$19/mo
  • Credits 10,000/mo
  • Research 3 runs/mo
  • Max sources 10
  • Concurrency 5
  • Support Email
PopularGrowth
$49/mo
  • Credits 100,000/mo
  • Research 10 runs/mo
  • Max sources 20
  • Concurrency 20
  • Support Priority
Pro
$99/mo
  • Credits 250,000/mo
  • Research 20 runs/mo
  • Max sources 30
  • Concurrency 50
  • Support Priority
Scale
$399/mo
  • Credits 1,000,000/mo
  • Research 60 runs/mo
  • Max sources 100
  • Concurrency 100
  • Support Priority + Slack

Cancel anytime·no lock-in·self-host the open-source core for free

Common questions

Questions, answered.

Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.

Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.

The open-source version (AGPL-3.0) runs locally on your own hardware with no limits, so you can try the full engine for free without a card. The managed API is paid, starting at $19/mo for Starter, and you can cancel any time from the billing portal.

Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.

Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.

Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.

Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.

Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.

Partners

Backing open web extraction

ColdProxyNodeMavenQuantum ProxiesProxy-SellerRapidProxyColdProxyNodeMavenQuantum ProxiesProxy-SellerRapidProxyColdProxyNodeMavenQuantum ProxiesProxy-SellerRapidProxyColdProxyNodeMavenQuantum ProxiesProxy-SellerRapidProxy

Ship your agent today. Scrape forever.

Cancel anytime. Migrate from Firecrawl in 60 seconds with the compatibility layer.

Star on GitHub