PDF or image? How to choose the right format for sensitive documents
Should you send the contract as a PDF or a photo? Save the ID scan as JPEG or stick it in a PDF? Each format leaks different things, behaves differently when forwarded, and changes what the recipient can do with it. Here is a working guide to picking the format that fits what you are actually sending.
Most of the file-format choices people make in a day are reflexive. You take a photo of the receipt because the phone is in your hand; you save the contract as a PDF because that is what the export menu offered first. For most files this does not matter. For the small share of files that contain something you would not want a stranger to read - an ID scan, a signed agreement, a medical letter, a bank statement, a payslip - the format you pick shapes what leaks, what stays editable, and what the recipient is able to do with it later.
This guide is a working answer to "PDF or image?" for the documents that actually need a careful answer. We will go through what each format really is, what each one leaks, the specific cases where one is clearly right, and the traps in both. Everything that can be done locally can be done in a browser tab without uploading the file.
What each format actually is
It is worth being concrete, because "PDF" and "image" both cover a lot of ground.
JPEG (and to a lesser extent HEIC on modern iPhones, PNG for screenshots, and WebP for web exports) is a single grid of pixels. There is no notion of pages, text, or layers - just the rendered image and a metadata block. The recipient sees what you saw; they cannot select, search, or copy any of the words inside the image unless they run OCR. The file is small if it is photographic content, larger if it is a page of black-and-white text.
PDF is the opposite. Defined by ISO 32000, it is a container that can hold text, vector shapes, images, form fields, annotations, embedded fonts, attachments, JavaScript, digital signatures, and per-page layers. A PDF of a scanned page can be either a wrapper around a JPEG (essentially "an image with PDF page furniture") or a structured document with real text on it. The two look identical and behave completely differently.
That distinction - "is the text inside this PDF real text or is it a picture of text?" - is the single most important property for almost every decision below. A PDF made by exporting a Word document has selectable, searchable, copyable text. A PDF made by taking a phone photo and saving it as PDF does not, unless something ran OCR over it. The same icon in your file manager hides both.
What each format leaks
Both formats carry hidden information. The categories overlap but the specifics differ.
JPEG and HEIC carry an EXIF block that, by default, can include GPS coordinates, the camera make and model, the body's serial number, the lens, the exposure settings, the OS version, the editing app, timestamps to sub-second precision, and an embedded thumbnail. The thumbnail catches people out: an editor can update the main image and forget to re-render the thumbnail, so the un-cropped or un-blurred version of the photo can sit inside the file. Our long guide on EXIF and how to strip it goes through this in detail.
PDF has its own metadata: a Title, Author, Subject, Keywords, Creator (the app that drafted the document), and Producer (the library that wrote the file). A PDF exported from Word usually has the document's original author name and sometimes the company. PDFs from scanning apps include the device model and the app's name. You can see what a given PDF carries with our PDF metadata viewer. Beyond the metadata block, a PDF can also carry:
- Form fields with default values, including fields the drafter forgot to clear.
- Comments and annotations from the review process. These travel with the file even when the comment pane is closed.
- Edit history in the form of incremental updates. The bytes of the previous revision can be present after the saved version, recoverable by anyone who knows what to look for.
- Embedded attachments: a PDF can carry other files inside it, sometimes added unintentionally by a "package" workflow.
- Layers and hidden text: text behind images, text outside the page boundary, watermarks set to invisible. All of it is still in the file.
- JavaScript and form actions, which can do things like submit data to a URL when the file is opened in a permissive viewer.
The summary: a JPEG mostly leaks where and when a photo was taken and with which device. A PDF mostly leaks who drafted what, when, and what they nearly said. They are different leaks and they call for different cleanup.
What each format lets the recipient do
Format also decides what the person on the other end can do without effort.
- Search and copy. A real-text PDF is searchable and copyable. An image - and an image-only PDF - is not, unless someone runs OCR.
- Edit. Both formats can be edited, but in practice editing a PDF without leaving traces requires Acrobat-class software. Editing a JPEG to add or remove content is trivial in any image editor and very hard to detect by eye.
- Re-flow. A PDF can be reformatted by accessibility tools and screen readers if it has structured text. An image is opaque to them.
- Print at correct scale. A PDF carries page size. An image has pixels and a DPI hint that printers and viewers often ignore, which is why a one-page receipt JPEG sometimes prints across two pages.
- Sign. PDFs support cryptographic signatures that prove the file has not been altered since signing. Images do not.
- Combine pages. PDF was built for multi-page documents. Sending five JPEGs of a five-page contract means the recipient has to assemble them, which is where pages go missing.
When the answer is clearly PDF
For these, do not save as image - save as PDF, even if it is one page.
- Anything multi-page. Contracts, IDs with both sides, tenancy agreements, medical reports, court filings. Send the recipient a single PDF, not a numbered set of JPEGs they have to open in order.
- Anything that will be printed. Page size, margins, and orientation matter, and PDF preserves them. JPEGs print to whatever the driver feels like.
- Anything that needs a signature, especially with another party at a distance. Use a real PDF signing flow rather than printing, signing, and scanning - the signed PDF is also a tamper-evident record. Our PDF sign tool runs locally for the common case of "drop a signature image into a contract."
- Anything sent to a process that expects PDFs. Banks, tax authorities, schools, embassies. Sending a JPEG means a human has to convert it for you, which often does not happen, or happens to the wrong version.
- Anything where the layout matters as much as the words. Invoices with line items, payslips with a specific structure, lab results with formatted tables. Render once, ship the PDF.
If you start with images - a stack of phone photos of a paper document - the clean path is to turn the images into a single PDF before sending. The result is one file with consistent orientation, page size, and ordering.
When the answer is clearly an image
Image formats win for a narrower set of cases, but they really do win for them.
- A single page where the recipient only needs to look. A photo of a parking ticket sent to your spouse. A screenshot to confirm a booking. A single ID page for a courier. PDF would be overkill and might fail to open inline on whatever app the recipient is using.
- Embedded in a chat or social post. Messaging apps render images inline. Many of them collapse PDFs into a generic file icon, so the recipient has to download and open them.
- Photos and graphics that have no text content. A picture of damage to a parcel, a meter reading, a map screenshot. Wrapping these in PDF is friction for everyone.
- When the recipient is going to OCR or process the pixels. If the receiving system runs OCR on submissions, a clean JPEG or PNG is sometimes easier for them than a PDF with mixed embedded fonts.
For image-only flows, keep two rules: pick the right format, and strip metadata before sending. JPEG is right for photographs (small files, lossy); PNG is right for screenshots and anything with sharp edges (lossless, larger); HEIC is fine to keep on Apple devices but should usually be converted to JPEG for anyone you are not certain can open it. Format conversion and compression both run locally in the browser.
The gray area: image-inside-PDF
A lot of the "sensitive documents" people send fall into a third bucket: a PDF that is just one or more JPEG pages wrapped in PDF page furniture. Phone scanning apps, multi-function printers, and "save as PDF" buttons in image apps all produce these. They look like PDFs and they behave like images.
This combination has specific traps:
- The PDF wrapper has PDF metadata and the inner image keeps its EXIF. Stripping one is not stripping the other. If a phone scanning app embeds the original JPEG with its GPS tags, the PDF you send carries those tags inside the page stream.
- File size is often larger than either pure format would have been, because the PDF wrapper adds overhead and the image was not compressed for its real use case.
- The recipient cannot search the text, even though the file is a PDF. This is what trips up people who later try to find a specific clause and come up empty.
- OCR run inside the scanning app sometimes adds an invisible text layer underneath the image. That layer is searchable - which is useful - and also copyable, so anyone with the file can extract the recognized text. Worth knowing when "redacting" by drawing a rectangle on top.
For sensitive image-inside-PDF documents, the safest sequence is: scan, run the PDF through a flatten / re-render step that removes hidden layers, strip the PDF metadata, and check the result by opening it in a viewer and trying to select the words you thought were images.
The redaction trap
This is the single most expensive mistake people make with sensitive documents, and it depends entirely on which format you started in.
In an image - a JPEG or PNG - drawing a black rectangle over a name and re-exporting really does delete the pixels underneath. The new JPEG has black where the name was, and the original pixels are gone. The only common gotcha is the embedded thumbnail surviving the edit, which is why you strip metadata after redacting.
In a PDF, drawing a black rectangle with the markup tools in Preview, Word, Acrobat's commenting tools, or most third-party viewers does not delete the text. It puts a black shape on top. The text is still inside the PDF as a separate object, fully selectable and copyable by anyone who opens the file in a different viewer or pastes its content. This is the cause of a long list of public incidents - sealed names in court filings, redacted diplomatic cables, names of confidential informants in police reports - all of which were extracted by selecting the text under the black rectangle.
Real PDF redaction has to delete the underlying objects, not cover them. Use a dedicated PDF redaction tool and verify by trying to select the supposedly redacted text in the saved file. If the redaction is real, there is nothing to select.
Converting between the two
It is common to need to move a document from one format to the other.
Images to PDF: useful when you have phone photos of a paper document and need to send one file. Order the photos, ensure they are right-side up, and combine into a single PDF. If any page contains something sensitive, strip the source images first, because the PDF will embed them.
PDF to images: useful when the recipient needs to glance at a page in a chat, when a system only accepts JPEGs, or when you want to publicly post a single page without the rest of the document leaking through the file structure. Export the pages as JPEG or PNG and send only the pages you intend to.
A note that catches a lot of people: converting a PDF to images and back to PDF is one of the cleanest ways to flatten a document, because it forces every page through a pixel grid. Form fields, comments, hidden text, edit history, JavaScript - all of it disappears. The cost is that the result is no longer searchable, and the file is usually larger.
Sending: the workflow that fits both
Whichever format you settle on, the same five-step routine handles the sensitive cases:
- Decide the format first. Multi-page or signed or printed or layout-sensitive → PDF. Single page, glanceable, sent in a chat → image.
- Strip metadata. EXIF for images; document properties, comments, and hidden layers for PDFs. Verify by reopening the cleaned file in a viewer.
- Redact properly, in a dedicated tool, and verify by trying to select the redacted text. For images, paint over the area with a solid color rather than relying on heavy blur.
- Right-size the file. If you are emailing it, compress the PDF or the image. Large attachments get blocked or re-uploaded somewhere on the way.
- Send via a channel that fits the sensitivity. A passport scan does not belong in a public Slack channel. A signed contract does not belong in a marketing-automation tool. Whatever the format, the channel matters as much as the file.
When the sender and the recipient disagree
The recipient's needs sometimes win even when they conflict with the better format. Two common patterns:
- A counterparty insists on a PDF of a single screenshot. Fine - wrap the image in a PDF wrapper, strip both layers of metadata, and send.
- A recipient cannot open PDFs reliably (older relatives, some mobile email clients, some chat apps). Export the PDF page(s) as JPEG and send the relevant pages. Accept that they have lost the ability to search or forward as a single document.
The goal is not to win the format argument. It is to send the version of the file that the recipient will actually open, in a form that leaks the least.
Two cases worth a sentence each
Passwords on PDFs. A PDF can be encrypted with a password before sending. This is a real protection if the password is delivered out of band (text the password, email the file - not both in the same email) and the password is long. It is not a substitute for sending the file to the right person in the first place. The PDF password tool handles this locally.
Splitting and merging. The cleanest way to share only the relevant page of a longer document is to split the PDF first and send only that page. The cleanest way to consolidate a stack of one-page PDFs is to merge them. Both reduce the risk of accidentally attaching the wrong revision.
The short version
PDF for documents. Image for pictures of single pages or moments. PDF for anything signed, printed, or multi-page. Image for anything you would otherwise drop into a chat. Strip metadata either way. Redact in a real redaction tool, not by drawing rectangles. Verify the result before sending. And do all of it locally, where the file does not have to leave your device to be cleaned up - which is what the rest of the Privvert toolkit is built for.
Frequently asked questions
Is a JPEG of a contract legally valid?
In most jurisdictions, yes - a photo or scan of a signed contract is admissible evidence of the agreement, the same way a faxed copy was for forty years. What changes between PDF and JPEG is not legal validity but workflow: a PDF can carry a digital signature that proves it has not been altered since signing, while a JPEG can only be visually compared. For low-stakes paperwork, an image is fine. For anything where someone might later argue 'that is not what we agreed,' use a PDF and, ideally, a real digital signature.
If I scan a passport, should I save it as PDF or JPEG?
JPEG, if the recipient only needs to look at it. PDF, if it has to sit alongside other pages (visa, supporting documents) or get printed by someone in a queue who expects PDFs. The bigger issue with either choice is metadata: phones embed GPS and device serial numbers in JPEGs, and PDFs from scanning apps embed device, app, and OS details. Strip both before sending, and never email an ID scan unless you know the recipient is the one who actually needs it.
Why do PDFs sometimes still contain text I thought I had hidden?
Because PDF stores text, vector shapes, and images as separate layers. Drawing a black rectangle over a name in Word, Preview, or Acrobat's markup tools puts the rectangle on top of the text - the text is still there in the file, fully selectable and copyable. Real redaction has to delete the underlying objects, not cover them. This has been the cause of a long list of public incidents, including court filings and government documents.
Are flattened PDFs really safer than the original?
Flattening - re-rendering every page to an image and embedding that single image back inside a PDF wrapper - removes form fields, scripts, comments, annotations, layers, and the structured text. It also removes the ability to search, copy, or reflow the text. For a signed contract going to an outside party, flattening is a useful belt-and-braces step. For an internal working document people still need to edit, it is the wrong choice.
What about screenshots instead of either format?
A screenshot is essentially a JPEG or PNG of whatever was on the screen, so it inherits all of the image-format trade-offs (no signatures, no searchable text, no multi-page support) plus a few extras. Screenshots from phones carry their own metadata (OS version, device model, sometimes a timestamp), and screenshots of documents are often lower resolution than the source. Useful for quick reference; the wrong choice for anything official.
If I'm not sure, what is the safer default?
PDF for anything multi-page, anything that will be printed, anything signed, and anything where formatting must not change in transit. Image (JPEG or PNG) for single pages where the recipient mostly needs to see what it looks like - an ID photo, a screenshot, a single receipt. Either way, strip metadata before sending, and prefer the format that requires the recipient to do the least work to view it correctly.
Related reading
The 'Print to PDF' trap: what your exported PDF still contains - and what a screenshot leaves out
Print to PDF feels like flattening a document to a clean, sealed file. It is not. The PDF that comes out the other side typically still contains the full selectable text under every black box, the original author name and editing history in the metadata, hidden layers from the source application, comments and tracked changes you thought you removed, and - on macOS and Windows - a record of the printer driver and the machine that produced it. A screenshot of the same PDF, by contrast, is a flat bitmap with none of that. Here is what Print to PDF actually preserves, why a flattened screenshot leaks less in many real cases, when each is the right tool, and how to produce a PDF that is genuinely safe to send.
·17 min readPDF redaction done right: why black rectangles in Word and Preview don't work
Drawing a black box over a name in Word, Preview or Acrobat's markup tools does not delete the text underneath - it sits on top of it, fully selectable and copyable. Here is what real PDF redaction looks like, the well-documented incidents that prove the point, and how to do it locally without uploading the file.
·18 min readAI tools and your files: what ChatGPT, Claude, and Gemini actually keep when you upload
Drag a contract into ChatGPT, upload a spreadsheet to Claude, hand a folder of photos to Gemini - and the question that almost nobody answers in the marketing is what happens to the file after the model has finished answering. The short version: it lives a lot longer than the reply does, in more places than the consent screen suggests, and the rules are different between the free tier, the paid tier, and the enterprise tier of the same product. There are also live legal carve-outs - the New York Times v OpenAI preservation order has forced ChatGPT to keep deleted chats since mid-2025 - that the in-app help pages do not mention. Here is what each of the big AI tools actually does with a file you upload, what 'we don't train on your data' actually means in 2026, the incidents that show what goes wrong when the policy and the reality diverge, and the practical answer for handling anything sensitive.
·21 min read