Modern HTML to PDF conversion
Updated
Generating a PDF from HTML used to be a small horror story. PhantomJS was abandoned. wkhtmltopdf was mostly working but stuck on an old WebKit. Most server-side libraries spoke their own dialect of HTML that didn’t match what your designers were building. Anything more complicated than an invoice meant either a third-party service or three weeks of fighting paged.js.
In 2026, the landscape is much cleaner. This is a refreshed version of our 2019 piece on HTML to PDF, the same options updated for the tooling that’s actually shipping today.
The traditional approaches (and why they’re still mostly retired)
The “classic” stack:
- wkhtmltopdf, based on a Qt port of WebKit from circa 2015. Still works. Still doesn’t render anything Flexbox-based correctly. Fine for invoices that look like they were designed in 2010, painful for anything else. Active maintenance is minimal as of 2026.
- PrinceXML, commercial, expensive, but the gold standard for paginated typesetting. If you’re producing books or technical documents with footnotes, cross-references, and multi-column layouts, Prince is still ahead of every browser engine. License cost rules it out for most SaaS use cases.
- PDFKit / iText / ReportLab / FPDF, programmatic PDF builders. You’re not converting HTML, you’re calling
doc.text("Invoice", 50, 50). Works great for documents you can fully control with code; awful for anything where the input is HTML someone else wrote.
If you control the source format, PDFKit-style libraries are still the right answer. If your input is HTML, especially HTML produced by a templating engine or a CMS, you want a real browser engine doing the rendering.
Headless Chrome in 2026
Chrome has had headless mode since 2017, but a lot has changed.
In 2022, Chrome introduced “new headless mode” (--headless=new), a real browser running without a window, sharing the same code path as desktop Chrome. The original headless mode was a separate, simplified renderer that occasionally produced different output from what users would see.
In Chromium 132 (January 2025), new headless became the default. If you’re spawning Chrome today with --headless, you’re getting the real browser. This sounds like a small change, but it eliminated an entire class of “works in the browser, breaks in headless” bugs we used to ship workarounds for.
The practical impact: any modern CSS feature that works in your desktop Chrome, Container Queries, View Transitions API (mostly irrelevant for print but harmless), @layer, has selector, print-color-adjust, variable fonts, color-mix(), nested CSS, works identically in headless rendering. You can build PDF templates with exactly the same skills as web pages.
Puppeteer vs Playwright in 2026
Both are first-class options. Both are maintained. Both can drive headless Chrome to produce PDFs. The differences:
- Puppeteer is Chrome-team-backed, ships fast for new Chrome features, and has the most existing ecosystem (lambdas, layers, examples). API surface is Chrome-specific.
- Playwright is Microsoft-backed, supports Chromium, Firefox, and WebKit through one API, has better autowait semantics, and is the default choice for new test suites. PDF generation works the same as Puppeteer.
For PDF-only use cases where Chrome is fine, Puppeteer is still slightly leaner because you’re not paying for the multi-browser abstraction. For anything that’s part of a broader test or automation suite, Playwright is the better long-term choice.
A minimum-viable PDF script in either:
// Puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/invoice/123', { waitUntil: 'networkidle0' });
const pdf = await page.pdf({
format: 'A4',
margin: { top: '25mm', bottom: '25mm', left: '20mm', right: '20mm' },
printBackground: true,
});
await browser.close();
// Playwright
const { chromium } = require('playwright');
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com/invoice/123', { waitUntil: 'networkidle' });
const pdf = await page.pdf({
format: 'A4',
margin: { top: '25mm', bottom: '25mm', left: '20mm', right: '20mm' },
printBackground: true,
});
await browser.close();
Almost identical. The Playwright API is slightly more ergonomic for the multi-browser cases that PDFs don’t care about.
Running headless Chrome on your own server
The “easy” stack:
- A Node process that imports Puppeteer
- A bundled Chromium binary (~130 MB)
- About 250 MB of memory per concurrent render
- A queue (BullMQ, SQS, anything) so you don’t fork-bomb the box
The hard parts in production:
- Chrome leaks file descriptors and zombies under load. Use
--no-sandbox --disable-dev-shm-usageand a watchdog that recycles the process every N renders. - Custom fonts need to be installed at the OS level and loaded with
@font-faceand awaited viadocument.fonts.readybefore you callpage.pdf(). Missing one of those three steps is the most common “fonts wrong in production” bug. waitUntil: 'networkidle0'doesn’t catch images loaded after a JS callback. For dynamic dashboards, add an explicitawait page.waitForSelector('.fully-rendered')checkpoint and trigger it from your template.- Single-page apps need a render-trigger inside the page; a server-rendered HTML page is much more predictable.
If you’re rendering more than a few thousand PDFs a day, plan for two boxes minimum (one in failover) and a real queue. Chrome will surprise you.
Serverless rendering in 2026
The serverless story has improved dramatically since 2019.
- Cloudflare Browser Rendering (GA in 2024) lets a Worker drive a real Chromium instance via a Puppeteer-compatible API, with no cold start and no binary management. Best option for low-volume, latency-sensitive rendering inside a Cloudflare-hosted stack.
- Vercel +
@sparticuz/chromium, community fork ofchrome-aws-lambda. Works on Vercel, AWS Lambda, Netlify Functions. Cold starts are real (~3–5 seconds for the Chromium binary unzip). Fine for low-volume async renders. - AWS Lambda with Container Images, package your own Chromium in a 1+ GB container, sidestep the layer size limits. More work, but no cold-start penalty after warm.
- Google Cloud Run, full Chromium binary, generous memory, scales to zero. The most “just works” option for self-hosted, on-demand rendering.
Numbers worth knowing: a single A4 page typically renders in 800 ms–2 s on warm headless Chrome (depending on font loading, image weight, and JS execution). Cold starts add 2–5 s on serverless. Memory peaks around 200–300 MB per render.
Building your own PDF microservice
If you decide to self-host, the architecture is well-trodden:
[Web app]
│
│ POST /pdf { url | html, options }
▼
[API gateway + auth]
│
▼
[Queue (BullMQ / SQS)]
│
▼
[Worker pool: Node + Puppeteer + Chromium]
│
▼
[Object storage (S3 / R2 / GCS)]
│
▼
[Pre-signed URL back to caller, or webhook]
Things that will go wrong, in order of likelihood:
- Fonts. (Always fonts.)
- Images that load after
networkidle. - Memory pressure on small boxes when renders queue up.
- Chrome zombies that don’t release ports until you
kill -9. - Customer HTML that includes a script that takes 60 seconds to “settle”, set hard timeouts.
This is the path Paperplane took in 2019 and it’s the path many teams still take. It works. It’s also a real maintenance burden. If your product isn’t itself a file-conversion API, that maintenance is overhead that doesn’t differentiate you.
Advanced typesetting: paged.js
Browsers do single-page paginated layout well. They do not do book typesetting, running headers that pull content from <h1>s on the page, footnotes that reflow when content moves, cross-references that resolve to page numbers, multi-column layouts with column-balanced overflow.
paged.js is an open-source polyfill for the parts of the CSS Paged Media spec that browsers don’t implement natively. It’s a JavaScript library that runs in the page itself, splits content into pages, and produces a fragmented DOM that headless Chrome then renders to PDF.
Status in 2026: actively maintained, used by university presses, scientific publishers, and a handful of design studios. Worth reaching for if your output looks more like a book than a webpage. Overkill for invoices.
The trade-off: paged.js runs in JavaScript inside the page being rendered. That adds 2–10 seconds per render and makes debugging harder. If you can solve your problem with native CSS print rules (here are the basics), do that first.
Commercial paged-media engines
If your output really does need book-quality typesetting and the budget is there:
- PrinceXML, still the leader. Excellent CSS support, footnotes, cross-references. Per-server licensing.
- Antenna House Formatter, XSL-FO and CSS, used in regulated industries (legal, pharma).
- PDFreactor, Java-based, strong CSS Paged Media support, server-license model.
These all produce noticeably nicer output than headless Chrome for paginated documents over ~50 pages. They’re priced for enterprise procurement, not weekend projects.
Comparing the options
| Approach | Setup effort | Render quality | Cost at scale | Best for |
|---|---|---|---|---|
| wkhtmltopdf | Low | Dated | Free | Legacy invoices |
| PDFKit-style libraries | Medium | N/A (you build it) | Free | Fully programmatic docs |
| Puppeteer + headless Chrome (self-hosted) | High | Excellent | Server cost + ops time | Mid-volume, in-house |
| Cloudflare Browser Rendering | Low | Excellent | Per-request | Workers stack, low volume |
| Vercel + Lambda + chromium | Medium | Excellent | Per-request + cold starts | Async, low-to-mid volume |
| PrinceXML / Antenna House | Medium | Best | Per-server license | Books, regulated docs |
| Conversion API (e.g. Converterer) | None | Excellent | Per-conversion | Anyone who doesn’t want to maintain Chrome |
The fork in the road in 2026 is honestly the same as it was in 2019: do you want to be in the business of running Chrome, or do you want to be in the business of whatever your product actually does?
If your conversion volume is high and predictable, the self-hosted route can work out cheaper per render. If it’s spiky, low-volume, or you’d rather your engineers spend time on your product than on Chrome process management, an API is almost always the right call.
Converterer’s website-capture API is one option in that category. Give us a URL, get a pixel-perfect PDF or image back, no Chrome cluster. Free up to 1,000 conversions a month. The HTML→PDF problem we wrote about in 2019 is still real; the answer is just easier now.