Privvert logoPrivvert
PrivacyPDFMetadata

The 'Print to PDF' trap: what your exported PDF still contains - and what a screenshot leaves out

Print to PDF feels like flattening a document to a clean, sealed file. It is not. The PDF that comes out the other side typically still contains the full selectable text under every black box, the original author name and editing history in the metadata, hidden layers from the source application, comments and tracked changes you thought you removed, and - on macOS and Windows - a record of the printer driver and the machine that produced it. A screenshot of the same PDF, by contrast, is a flat bitmap with none of that. Here is what Print to PDF actually preserves, why a flattened screenshot leaks less in many real cases, when each is the right tool, and how to produce a PDF that is genuinely safe to send.

By the Privvert team··17 min read

There is a comforting fiction about the Print to PDF button. The fiction goes like this: you have a document in Word or Pages or a browser tab, you choose File > Print > Save as PDF, and the result is a clean, sealed, paper-like artefact - a snapshot of the visible page with no editing history, no author name, no hidden layers, and nothing the recipient can dig into. It is the digital equivalent of pressing the document onto a sheet of paper. What you see is what they get.

The fiction is wrong in almost every detail. A PDF produced by Print to PDF in 2026 is not a snapshot of the page; it is a structured file with the text as selectable text, the images as embedded images, the layers as separate layers, the form fields as live form fields, and a metadata block at the top that names the source application, the operating system, the document title, the author, the creation timestamp, and the last-modified timestamp. Every black rectangle you drew over a sensitive name is still sitting on top of the original text, which is still selectable underneath. Every comment you thought you removed may still be there in the document structure. The PDF /Producer field tells the recipient that you saved it from a 2019 MacBook Pro running Sonoma at 11:47 last Tuesday.

None of this is a bug. PDFs are designed to preserve all of that information, because most of the time you want them to. The problem is that the Print to PDF button looks and feels like a flattening operation, and it is not. This piece walks through what Print to PDF actually preserves, the categories of leakage people most often miss, why a flat screenshot leaks much less in some real situations, and the two-step recipe for producing a PDF that is genuinely safe to send.

What 'Print to PDF' actually does under the hood

On every modern operating system, the Print to PDF button routes the document through the same machinery the OS uses to send pages to a physical printer. The application generates a print job in a high-level page description language (PostScript on older systems, Quartz drawing operations on macOS, XPS on Windows, Cairo or Skia on Linux and browsers), the operating system hands that job to a virtual printer driver named something like 'Save as PDF' or 'Microsoft Print to PDF', and the virtual driver translates the drawing operations into PDF content streams instead of sending them to a physical device.

The crucial detail is what 'translation' means. PDF is not a bitmap format - it is a structured page description language with its own text, vector, image, font, and metadata primitives. When the print driver encounters a run of text in the print job, it does not rasterise the text into pixels and then write the pixels to the PDF. It writes the text as text, with the font, the position, and the character codes preserved, because that is the format PDF natively expects. The same is true for vector shapes (preserved as vectors), embedded images (preserved as compressed image objects), and overlaid drawing layers (preserved in their original z-order with the layer below still present).

This is the right behavior for the case Print to PDF was designed for: producing a high-quality, searchable, accessible, printable artefact of a document. It is the wrong behavior for the case people most often use it for: producing a sealed, flattened, redacted, anonymised artefact for safe sharing. The same machinery that makes the PDF searchable also makes every black rectangle transparent to a determined reader.

The five categories of leakage

Across the documents that end up causing public-facing incidents, the same five categories of leakage recur. None of them are visible in a normal page view.

1. Selectable text under visual redactions

This is the famous one, and it is covered in detail in the dedicated piece on PDF redaction, but it is worth restating because Print to PDF makes it particularly easy to do wrong. A black rectangle drawn in Word, Pages, Google Docs, Preview's markup tools, Acrobat's Comment tools, or the highlighter in a browser PDF viewer is a drawing object. The text underneath is a text object. Both objects survive Print to PDF unchanged, in their original z-order. The visible page shows the rectangle on top. The PDF content stream contains both, and any standard PDF viewer will let the recipient select-all, copy, and paste the underlying text. The list of well-documented public incidents involving exactly this mistake is long enough to fill a separate article - it includes US federal courts, the NSA, the TSA, several major law firms in New York and London, and at least one acquisition disclosure that re-priced a deal.

2. Document metadata

Every PDF has a metadata dictionary, often called the Document Information Dictionary, that lives near the start of the file. The standard fields are /Title, /Author, /Subject, /Keywords, /Creator (the application that authored the source document - 'Microsoft Word for Mac', 'Pages 14.0', 'LibreOffice 7.6'), /Producer (the engine that wrote the PDF itself - 'Adobe PDF Library 23.0', 'Mac OS X 14.4 Quartz PDFContext', 'Microsoft: Print To PDF'), /CreationDate, and /ModDate. On documents produced via Print to PDF, the /Title and /Author fields are typically inherited from the source application's document properties without being shown to you, and the /Producer field always records the exact engine and version. A one-line exiftool command on the file reads all of this out in plain text.

On top of that, many PDFs carry a second metadata block in XMP (Extensible Metadata Platform) format - an XML document embedded at the end of the file that can include the original document's instance ID, a history of editing actions, the application versions used at each stage, and on some pipelines the original file path on the author's machine. XMP is what photo editors use to attach edit history to JPEG and TIFF files, and it travels in PDFs the same way.

3. Layers, form fields, and optional content

PDFs support optional content groups - the spec's name for layers - which let a single PDF carry multiple toggleable visual states. CAD exports, GIS maps, multi-language documents, and engineering drawings routinely have ten or twenty layers, only one of which is visible at a time in the default view. When you Print to PDF from a viewer showing such a document, the behavior is viewer-dependent: some viewers flatten to the visible state, some preserve all the layers in the output, and some preserve the layers but hide all the non-visible ones in a way that a casual reader will not notice but an inspector will. The safe assumption is that layers survive unless you explicitly flatten.

Form fields are a related trap. A fillable PDF that you have completed in Acrobat or Preview, and then re-printed via Print to PDF, often comes out the other side with the form fields still live - the recipient can click into the field and edit your answer, or worse, can read the form's default value and any JavaScript validation attached to the field. If your intent is to send a filled-in form that the recipient cannot edit, you need an explicit Flatten Form Fields step, not a re-print.

3a. Digital signatures (which Print to PDF silently breaks)

A digital signature on a PDF is a cryptographic hash over the file's contents, signed with the signer's private key. The signature is valid only as long as the file's bytes match the hash. Re-printing a signed PDF via Print to PDF produces a new PDF that is visually identical but is byte-for-byte different; the new file does not contain the signature object at all, and any signature blocks that survive as visible page elements are now just decorative rectangles with no cryptographic meaning. The recipient sees a 'signed' document that is not actually signed, and the original signer's name and timestamp are still printed on the page. This is a real risk for contracts, audit reports, and regulatory filings.

4. Comments, tracked changes, and hidden text

Word, Pages, and Google Docs all keep tracked changes and comments as a parallel structure to the document text. The print pipeline's behavior depends on the source application's print-time setting: if 'Print markup' is on, the comments are rendered to the visible page (and become part of the PDF the recipient sees); if it is off, the comments are typically dropped from the rendered page. What is less reliable is whether the comments survive in the PDF structure when they are not rendered. In some pipelines they are dropped entirely. In others - particularly the Acrobat plugin chain on Windows - they are preserved as PDF annotation objects that do not show on the page but are visible in the Comments pane of any PDF viewer that supports them.

Hidden text is a related category. Word documents often contain text in headers, footers, hidden text runs (Format > Font > Hidden), text in white-on-white, text inside collapsed outline sections, and text in document properties that the author never realised was being saved. The print pipeline's treatment of each of these is application-specific and version-specific, and the only reliable way to know what survived is to open the resulting PDF and inspect it.

5. Embedded fonts, attached files, and JavaScript

PDFs can carry embedded font subsets, which on rare occasions can leak the corporate font license text or the foundry's licensing metadata. They can carry attached files - the PDF spec lets you bolt an entire Excel workbook or a folder of images onto a PDF as an attachment, and some invoicing and EDI workflows do this routinely. They can carry JavaScript, which can fire on open and do things like populate form fields, validate input, or make outbound HTTP requests (the latter is blocked by every modern viewer, but the JavaScript is still in the file as evidence of intent). Print to PDF from most viewers will strip attachments and JavaScript, but Print to PDF from the source application (Word, Pages) may not have to strip them because the source application does not generate them in the first place. The safer assumption is that any PDF whose origin you do not control may carry any of the above.

Why a screenshot leaks less, and where it does not

A screenshot is a flat raster bitmap. The format (PNG, JPEG, HEIC) has none of PDF's structure: no text layer, no layers, no form fields, no annotations, no embedded JavaScript, no attached files, no XMP edit history. The only metadata it carries is the screenshot tool's own - a creation timestamp, sometimes an OS version, and on iOS/macOS the screen dimensions and the device model. That metadata exists, and on a photo it can include GPS coordinates and the camera's serial number (covered in the EXIF article), but a software-generated screenshot of a window has nothing comparable to leak.

For sharing a single piece of evidence - one paragraph of a long document, one chart, one error message, one diagram - a screenshot is often the right answer. Whatever is visible in the image is what the recipient gets. Nothing in the image can be selected, edited, or inspected for more than is already visible. The original document's full text, comments, metadata, and editing history simply do not travel with the picture.

The trade-offs are real, and they are the reason a screenshot is not the universal answer. Screenshots cost the recipient: no selectable text means no copy-paste, no machine-readable structure, no accessibility for screen readers, larger files for multi-page content, and lower quality if the recipient prints. They cost the sender on bulk documents - a 40-page contract as 40 PNGs is unwieldy. They lose vector quality on diagrams, which look fine on screen but pixellate when zoomed or printed. The trade-off discussion lives in the dedicated PDF or image piece, but the short version is: screenshot for evidence, cleaned PDF for documents the recipient needs to use.

The two-step recipe for a PDF that is safe to send

A PDF that is genuinely safe to send is one where the visible content is the only content - no underlying selectable text under redactions, no author metadata, no layers, no form fields, no attachments, no comments. There is a reliable recipe.

Step one: clean at the source

  1. Accept or reject all tracked changes and remove all comments in the source application before exporting. In Word: Review > Accept All Changes, then Delete All Comments. In Pages: Edit > Track Changes > Accept All, then Insert > Comment > Delete All. In Google Docs: File > Version history > See version history will tell you what is in there; File > Make a copy creates a clean version.
  2. Do real redactions in a tool that strips the underlying content stream, not by drawing rectangles. Acrobat's Redact tool removes the text and replaces it with a black box that is a single image, not a rectangle over text. The Privvert PDF redact tool does the same locally in the browser. The full reasoning and the list of well-known incidents from doing it wrong is in the PDF redaction article.
  3. Empty the document's metadata fieldsin the source application before exporting. In Word: File > Info > Inspect Document > Check for Issues > Document Inspector, then remove document properties and personal information. In Pages: File > Advanced > Remove Pages metadata (the exact name varies by version). In Google Docs, the metadata is server-side and the cleanest path is to copy the text into a new document.
  4. Turn off 'Print markup' / 'Print comments' / 'Print hidden text' in the print dialog before saving. If your application offers a 'Save preview thumbnail' option, turn it off - some old PDF readers cached the source application's preview in the PDF itself.

Step two: flatten through a render pass

Even with a clean source, the export still produces a structured PDF with metadata, fonts, and potentially layers. The reliable way to flatten everything in one pass is to render every page to an image and then re-assemble those images into a fresh PDF. The result is a PDF where every page is a single image - no text layer, no metadata beyond the new PDF's own, no layers, no form fields, no annotations, no JavaScript, and no attachments. The cost is that the recipient cannot select text, and the file is larger.

The render-and-reassemble pass can be done locally in the browser with the Privvert PDF-to-images tool followed by the images-to-PDF tool. No upload, no server in the middle. The same effect is achievable with command-line tools (pdftoppm followed by img2pdf) or with Acrobat's Sanitize Document feature, which in one click strips metadata, attachments, JavaScript, hidden layers, and form fields - it does not flatten the text layer, but for most threat models a Sanitize pass followed by a metadata wipe is enough.

For documents where the visible content really is everything you want to send - a one-page invoice, a single screenshot of a chart, a copy of an ID - skip the PDF entirely and send a flat PNG or JPEG. Strip its EXIF first if it came from a phone or camera (covered in the photo metadata piece), and you have a file that cannot leak anything beyond the pixels you see.

How to inspect a PDF before you send it

Before any meaningful document leaves your machine, three quick checks catch most of the categories above.

  1. Select All, then drag-copy across what you think is redacted or flattened. If the cursor changes to a text cursor over your black boxes, the text underneath is selectable. If paste produces the original text, the redaction is cosmetic.
  2. Read the metadata. The Privvert local PDF metadata tool reads /Title, /Author, /Subject, /Keywords, /Creator, /Producer, /CreationDate, /ModDate, and the XMP block without uploading the file. On the command line, exiftool report.pdf or pdfinfo report.pdf does the same. If any of those fields contain a name, an email, a file path, an OS version, or a timestamp you do not want the recipient to see, strip them before sending.
  3. Inspect the structure. Acrobat's Preflight, the command-line qpdf --check, or any of the open-source PDF structure viewers will list optional content groups (layers), attached files, embedded JavaScript, form fields, and digital signatures. If anything in that list should not be in the file, remove it before sending.

The checks take a minute. The cost of skipping them is a recipient who can paste the redacted name into a chat, read the author's identity in the metadata, and tell their colleague which version of Word was used to write the document.

Where this fits

Print to PDF sits at the intersection of two recurring patterns: a convenient default that does the opposite of what most people assume, and a document format that was designed to preserve everything because most users want everything preserved. The redaction-specific failure mode is in the PDF redaction piece; the broader question of when to send a PDF and when to send an image is in PDF or image; the equivalent metadata problem on photos is in removing photo metadata; and the larger question of what 'delete' actually accomplishes when the file lives in a backup, a cache, or a counterpart's mailbox is in the delete piece.

Privvert's PDF tools all run locally in the browser - the file never leaves the device, which matters most for exactly the kind of document you are thinking about cleaning. The reasoning is on the privacy page, and the rest of the practical guides are on the blog.

Related reading

How this article was written

Written by the Privvert team. Technical claims were checked against primary specifications and tested where possible; product behaviour was verified against current versions on the publication date; historical and news claims are sourced from named outlets, agency advisories, or primary documents. No part of this article was generated by an AI and posted as-is. Read the full editorial guidelines.

Privvert builds in-browser tools that never upload your files. Want to put this guide into practice? Browse the toolkit or read more on the blog.