Question 1

What is webclaw?

Accepted Answer

Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.

Question 2

How is webclaw different from headless browser scrapers?

Accepted Answer

Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.

Question 3

Can I try it before paying?

Accepted Answer

The open-source version (AGPL-3.0) runs locally on your own hardware with no limits, so you can try the full engine for free without a card. The managed API is paid, starting at $19/mo for Starter, and you can cancel any time from the billing portal.

Question 4

Can I self-host webclaw?

Accepted Answer

Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.

Question 5

What output formats are supported?

Accepted Answer

Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.

Question 6

How does the MCP integration work?

Accepted Answer

Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.

Question 7

Is my data private?

Accepted Answer

Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.

Question 8

What are LLM extraction features?

Accepted Answer

Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.

The web scraper yourAI agent deserves.

Drag the seam. Watch a page become tokens.

Everything an agent needs. Nothing it doesn’t.

Fast by default, smart when needed.

A drop-in Firecrawl replacement.

Best-in-class bot protection.

Every format, every extraction.

Built for AI agents.

90% fewer tokens.

Agentic scraping.

Deep content recovery.

Pay for pages, not seats.

Questions, answered.

Backing open web extraction

Ship your agent today. Scrape forever.