Firecrawl is an open source web scraping and crawling API you can self-host — it turns any website into clean, LLM-ready markdown or structured JSON, a self-hosted alternative to scraping services like ScrapingBee and Apify.

What is Firecrawl?

Firecrawl is an open source API that turns websites into clean, LLM-ready data. Point it at one URL and it returns the page as markdown, HTML, a screenshot, or structured JSON; point it at a domain and it crawls every reachable page in a single request. It renders JavaScript and dynamic content, so you don’t write or maintain brittle CSS selectors.

What is Firecrawl best for?

Developers building AI, RAG, and agent pipelines that need web content as clean markdown or structured JSON — without standing up their own headless-browser fleet. It’s the right fit when you’d otherwise glue together Playwright, proxy rotation, and HTML-to-markdown parsing, or when you want to feed live web data straight into an LLM.

What can Firecrawl do?

Scrape a single URL to markdown, HTML, structured JSON, or a full-page screenshot
Crawl an entire site from one request, following links and subpages
Map a domain to discover every URL before you scrape it
Search the web and return full-page content for each result
Extract structured data against a JSON schema you define
Run page actions — click, scroll, type, wait — before extracting (interact)
Batch-scrape thousands of URLs asynchronously
Call it from official SDKs (Python, Node.js, Go, Rust, Java, Elixir), an MCP server for AI agents, or automation tools like n8n

Is Firecrawl free?

Yes — Firecrawl is free to self-host under the AGPL-3.0 license, and you only pay for your own server. The managed Firecrawl Cloud is a paid, credit-based API: a free tier gives 1,000 credits a month, and paid plans start at $16/mo (Hobby, 5,000 pages), then $83/mo (Standard, 100,000 pages) and up. One credit usually equals one scraped page.

Where does Firecrawl fall short?

The self-hosted build is deliberately less capable than the cloud. It has no access to Fire-engine, so advanced anti-bot handling, IP-block avoidance, and stealth proxying are on you — and the /agent and /browser endpoints aren’t supported.
Structured-output features — JSON format, the /extract endpoint, summaries — require you to wire in your own LLM provider (such as an OpenAI key) when self-hosting.
It isn’t a single service — a working self-host runs the API alongside Redis and a separate Playwright browser service, so there are a few moving parts to keep healthy.

What does Firecrawl replace?

Firecrawl is a self-hosted alternative to hosted scraping APIs like ScrapingBee and Apify. It does the same turn-a-URL-into-data job — JavaScript rendering, crawling, and structured extraction — but you can run it on your own infrastructure under an open source license instead of paying per credit. Several open source crawlers, such as Crawl4AI, cover similar ground.

FAQ

Is Firecrawl open source? Yes. The core engine is licensed under AGPL-3.0 (its client SDKs are MIT), so the code is public and you can self-host, audit, or modify it. The managed Firecrawl Cloud layers extra features on top.

Can I self-host Firecrawl for free? Yes — self-hosting is free under AGPL-3.0; you only pay for the server, plus any proxy or LLM API keys you choose to add. Note the self-hosted build lacks some cloud-only features like Fire-engine and the /agent endpoint.

Is Firecrawl a good ScrapingBee or Apify alternative? For teams that want web data as LLM-ready markdown and prefer to self-host or avoid per-credit pricing, yes. If you need turnkey anti-bot bypass and large proxy pools with no setup, the hosted services (or Firecrawl Cloud) still do that with less work.

What do I need to run Firecrawl? A server with Docker, plus Redis and a Playwright browser service. For structured-extraction features you also add an LLM provider key, and for tougher sites you supply your own proxies.